Six Steps to Harvest Airflow Management Returns15 min read
After nearly twenty years of promoting the benefits of data center airflow management, I am afraid I need to confess that sometimes I may leap directly into the minutia of some operational algorithm or perhaps into a debate on the relative merits of some particular variations of mechanical plant implementation and thereby egotistically assume all my readers have attended one of my courses or are well read in my library of publications. Today by way of apology, I will take a step back and walk through a simple plan for sneaking up on the harvest of returns from upgrading a data center with good airflow management practices. I might also confess that when I manage a data center airflow management upgrade project, I tend to sneak up on my end state like a bull in a china shop and go straight to my calculated end condition. I realize that there may be some managers out there who may be a little more cautious in executing their responsibilities for the health of thousands or hundreds of thousands of dollars of IT equipment and millions of dollars of associated data. Therefore, let’s walk through the six simple steps for taking advantage of executing an airflow management upgrade project:
- Hold educated expectations
- Ascertain baseline
- Switch to supply set point
- Adjust supply fan speeds
- Adjust supply temperature
- Monitor and manage
First, by effective airflow management, I am referring to implementing all those practices that separate the supply air mass on the IT equipment intake side from the return air mass on the IT equipment exhaust side, minimizing, as it were, what we call bypass airflow and hot air re-circulation. Upsite Technologies’ 4 R’s of Airflow Management™ provide a useful roadmap of these practices:
- Raised floor – plug all the holes
- Rack – blanking panels and means to plug all other holes between front and back
- Row – maximize separation between cold aisle and hot aisle, i.e., some degree of containment
- Room – essentially do what I am recommending here
The beauty of the 4 R’s is that they provide a path to a degree of flexibility in implementing airflow management. While I would always strongly encourage my clients to implement everything and then start climbing the ladder of the six steps, they do represent a reasonable hierarchy that allows us to clean up the raised floor, begin the six 6 steps and then clean up the racks and continue fine tuning the six steps and so on through the rows and room.
The reasonable expectation from implementing an airflow management improvement initiative is, of course, a lower PUE, energy savings, and more consistent temperatures in the data center. The more immediate effect, however, can be a bit of a surprise. Without having done anything more than plugging all the holes in the floor, racks and rows of racks, we will experience two general conditions:
- The data center will be dramatically colder
- The cooling units will be supplying a significant surplus of chilled air volume into the data center
A low temperature set point and excessive fan speeds will be contributing to this over-cooling. Progress through the six steps will correct the over-cooling and help us understand how much the different sources have contributed to that over-cooling.
Before implementing any airflow management changes and before modifying CRAH fan speeds and temperature settings, it would be useful to understand the current mechanical operating efficiency. Category 3 PUE (12 month energy at the server plug connection) would be a reasonable baseline for calculating project return on investment, though specific data points on chiller and CRAH energy would be more useful for ongoing management of the facility after the airflow management implementation.
Prior to deploying airflow management improvements, I suspect CRAHs will typically be set up with a return air temperature set point somewhere around 68-72˚F. Upon completion of airflow management improvements, there may be some server inlet temperatures in the data center that actually increase because it will be possible some return air pockets will be below set point, resulting in cooling turned off and return air merely being re-cycled into the data center. Therefore, the first change after completion of airflow management improvements should be to change from a return air set point to a supply temperature set point. The supply set point should as closely as possible mimic what the supply temperature was with the current return air set point. If the set point was 70˚F, then the supply temperature was likely 58-62˚F. If that number is not readily available and if there is some discomfort with taking that leap of faith with a calculated set point, then the measured temperature of supply air coming through perforated floor tiles serves as a reasonable proxy. Depending on how badly the airflow management improvements were needed, that floor tile temperature could be anywhere from 1-5˚F or more higher than the actual supply at the cooling units. Nevertheless, if this temperature becomes the CRAH set point, with the airflow management improvements at the first R (raised floor), the effective supply temperature into the room remains the same and it’s probably safe to raise the chiller a degree or two and start to see some of those promised economies.
With airflow management improvements in place and supply temperature the same as what had previously been produced as a result of a return air set point, the data center will be cold.
To determine how much extra chilled air is being produced we will want to look for bypass air in the form of cold air leakage outside of the cold aisles. If a full containment system has been installed, then measuring temperature around joints in the containment structure for cold air on the hot side of the containment will give us a target to reduce. Another source of bypass airflow is actually through the servers themselves. Ideally, we would want to know the ΔT through our IT equipment prior to containment so we could look for excess airflow in this metric as well. If partial containment has been implemented, measuring temperatures toward the ceiling above the partial containment boundaries will identify bypass airflow. (See “Data center Temperature Sensor Location”, Oct. 24, 2018 and “The Shifting Conversation on Managing Airflow Management: a Mini Case Study,” Aug. 22, 2018 on the Upsite Technologies blog page for more information on temperature sensor placement.) If the temperature above the partial containment boundary is the same as the air entering the cold aisle through perforated floor tiles, we are over-producing bypass airflow and can turn down CRAH fan speeds. These adjustments took approximately four hours to stabilize in my data center lab and could take up to a day in a larger space. When the space has stabilized, we merely repeat the measure and adjustment process until we are not finding cold air leakage through containment structural joints or until we have some warmer temperatures above the partial containment but not within the cold aisle delivery space. At this point we will already be experiencing energy savings from the CRAH fans and our data center will still be too cold.
In order to dial in our supply temperature, ideally we would want to monitor inlet temperature to the servers at the bottom and top of each rack. If that is not practical, then we can at least monitor at the top server and bottom server of each rack at the ends of rows and in the center of each row. If our airflow management implementation has been effective and we have set supply temperatures at 60˚F, then all our temperature sensors should be recording readings around 61-65˚F. If all our temperature sensors read 60˚F, then we still have an opportunity to reduce CRAH fan speeds a little more. At this point, our data center cold aisles will be much colder than necessary and we can begin the process of increasing supply temperatures. For every degree that we raise our supply set point we can increase the chiller temperature by a like amount. I have seen my colleagues do this one degree at a time and up to five degrees at a time. My personal inclination is to just go for the gusto, but I understand caution in such a high stakes enterprise. I recall some twelve years ago a friend of mine was proud of having convinced management to allow him to raise his supply 1˚F and then operate for a month and then make another increase. He was most proud of the great restraint he showed when they finally hit their ongoing operating point and he kept to himself that the company had spent over $500,000 for cooling energy that they did not need to use during the cautious change process.
We monitor temperatures at the top and bottom servers of our selected racks at each change of condition. If the difference between the bottom sensor and top sensor is more than 3˚F, then we will want to slightly increase our CRAH airflow volume (fan speed). If that difference drops to zero, then we can reduce our fan speed. Once these measurements are stabilized, then we can increase temperature settings again. We can repeat these steps until the sensor readings at the top of the racks record our maximum desired temperature. For example, If we want to keep all temperatures within the ASHRAE 80.6˚F upper envelope, then I would suggest we stop at 78-79˚F just so we have a little safety margin to cover extended breakdowns in the airflow management system during maintenance or moves-adds-changes activity. When we are able to maintain a top server inlet temperature of 78˚F and a bottom server inlet temperature of 75˚F, then we can do our final fine-tuning for ongoing operation of the space. We should be able to manipulate airflow and temperature to maintain a bottom of rack inlet temperature of 76˚F and top of rack inlet temperature of 79˚F.
Once we have established these operating parameters, temperature will typically not require further manipulation. The one exception to that might be if the data center saw significant density increases resulting in some leakage of the airflow management barriers. Otherwise, any normal adjustments in IT utilization, equipment moves-adds-changes or workflow shedding should be handled by airflow volume (CRAH fan speed) adjustments.
My astute readers may have determined that this was not really a six step process. That cautious data center manager who moved the needle at one degree increments may be looking at a thirty-four step process. But let’s not quibble. To walk to the mail box, I 1.) put one foot in front of the other, and 2.) repeat as necessary. I will not call that a 300 step process; it is just a two-step process in which I take 300 or so steps. Regardless, this simple process is a roadmap to reaping the benefits of implementing airflow management best practices and can guide the most cautious manager as well as the most aggressive manager and works equally well for staged implementation of the 4 R’s or full implementation.
Data Center Consultant
Let's keep in touch!