The Role of Environmental Monitoring in Data Centers11 min read
The role of data center environmental monitoring has evolved over time as has the data center being monitored. As data centers have migrated from low density cost sinks to high density competitive operational advantage opportunities, our need to understand the environmental conditions in these data centers has changed in both degree and substance, and the current proliferation of monitoring systems, sensor packages, and integrated controls reflect that increased importance. Despite the relatively recent explosion of new products, new companies and promotional rhetoric, I believe we are still well in front of the tipping point on fully realizing the benefits of these monitoring and associated solutions.
In the beginning, the role of environmental monitoring was merely to alert us to the presence of a hot spot that required some kind of corrective action. That corrective action could vary from lowering a set point to adding a fan somewhere (back in those days I saw about an equal deployment of “corrective action” free-standing fans in hot aisles as in cold aisles – it wasn’t THAT long ago!) to turning in a requisition for a new CRAC to finding and fixing a broken CRAC. The actual monitoring was rather rudimentary, often relying on walk-throughs or alerts sent by over-heated computers.
As airflow management became a subject of conversation at industry conferences, re-circulation and bypass airflow started to work its way into the IT lexicon and we started to see temperature sensors in the return air path to CRACs and air handlers to alert us to short-circuited airflow. Our concern with short-circuiting was not yet driven by an understanding of energy saving opportunities as much as a continued concern about keeping all of our equipment running cool enough. While it still remained mostly a mystery to the IT community, at this time our facilities community was starting to understand how short-circuiting could result in return air temperature below set point, thereby creating an increase in supply temperature. As lowered set points and additional cooling capacity were instituted across the industry, frequently resulting in higher data center temperatures and more hot spots, the need to monitor for short-circuiting airflow became more apparent. The seriousness of the short-circuiting problems revealed through some nascent monitoring attempts opened the door for expansion of the airflow management industry, leading to floor grommets, CFD tools, blanking panels, chimney cabinets, containment architectures and all manner of duct-fan-baffle accessory.
As the tools for managing airflow and monitoring temperatures began appearing, industry standards began to address how to perform the monitoring and define thresholds for monitored conditions – temperature and humidity primarily. Telcordia (then BellCore) GR-63 CORE told us to locate sensors 4.9 feet off the floor and 15” in front of our equipment and TIA-942 told us to locate sensors 5 feet off the floor in the center of cold aisles. ASHRAE somewhat fine-tuned these recommendations by following the Telcordia standard for normal monitoring but recommending a little more granular approach for initial set-up and subsequent trouble-shooting: sensors 2” in front of equipment in each rack at the bottom, middle and top. At this time, we had some recommendations for server inlet temperatures from ASHRAE as well, which afforded us the opportunity to keep our relatively low set points with an understanding of some safety margin afforded by the new environmental envelopes and our monitoring could now alert us to unexpected changes before we had actual problems.
As our industry grew in sophistication (and complexity and density) and our environmental monitoring practices became more pro-active, the increased awareness of our industry’s energy use was accompanied by a shift in focus of our airflow management experts from preventing hot spots to conserving energy, and the environmental monitoring systems, deployed more or less in conformance to ASHRAE-TIA-BICSI recommendations, began to take a more direct role in providing feedback to mechanical plants. At first, this role was refereed by some form of manual intervention between the two systems, but over time opportunities have grown for more direct integration between the data floor monitoring and whatever automated or semi-automated systems may be controlling the mechanical plant.
Today, we have progressed from identifying hot spots to preventing hot spots to controlling the mechanical plant to eliminate gross waste to today where the mission should be to optimize the performance of the IT stack simultaneously with optimizing the efficiency of the mechanical plant. The starting point for data center optimization is to control the mechanical plant based on server inlet temperatures, which is a major departure from legacy practices of set points similar to comfort cooling thermostats and still a departure from setting supply temperatures based on periodically reading cold aisle temperatures per the prevailing industry standards. Relatively granular environmental monitoring is critical to effectively optimizing the entire data center. I am sure there are many ways to get this done, but one strategy that I have seen work well is to take the ASHRAE recommendation for start-up and trouble-shooting and use that for your basic management monitoring scheme – sensors 2” in front of equipment in bottom, middle and top of cabinet. I have seen this done in a few large data centers with sensors in every single cabinet, but I think some strategic sampling would be adequate. For example, ends of rows and wherever there might be higher than normal density cabinets and where high and low density cabinets were adjacent to each other. I think some thoughtful consideration could avoid 100% coverage granularity, but since the driver for the whole machine is server inlet temperature, it needs to be granular enough to allow you to push the envelope as close as possible to the upper temperature threshold. The way I have seen this done is the mechanical plant is looking at a maximum temperature and a differential between the top and bottom sensor. The maximum sensor reading will control temperature – typically valve openings or pump flows, and the temperature differential will control air volume – fan speed. I have seen this differential set at 2˚F, so airflow was increased or decreased to maintain the target differential. In actual practice, that target differential is going to depend on the quality of separation between data center supply air and return air, but obviously the smaller the differential the more efficient the data center can be.
The tools available to our industry for environmental monitoring today are so far advanced from just a few years ago; however, I am afraid that too many applications of this monitoring technology are still focused on the simple work of finding or preventing hot spots and generally helping with management of the effectiveness of the data center. That’s why I started off this discussion by observing we are still on the front end of the tipping point on exploiting the benefits of environmental monitoring. The key to taking advantage of all the recognized best practices for airflow management is effective environmental monitoring that will allow us to take advantage of fan law energy savings, chiller set point efficiencies, maximized access to free cooling and the sweet spot for whatever server class we are deploying. There is still plenty of work to do, but better tools with which to do that work.
Data Center Consultant
Let’s keep in touch!
Airflow Management Awareness Month
Free Informative webinars every Tuesday in June.