The Shifting Conversation on Managing Airflow Management: A Mini Case Study
Is managing airflow management redundant or superfluous or perhaps overly scrupulous? Nervously risking a loose biblical association, I would merely point out that …in the beginning … airflow management was a means for reducing data center equipment hot spots. The relationship between airflow management and meeting the objective was relatively simple and straightforward: Suffering from hot spots? Implement some airflow management, that is to say, start plugging holes between the areas that should be cooler and the areas that should be warmer and, tah da – no more hot spots. Some time passed and the conversation shifted a bit. Could I have stranded capacity? Implement some airflow management and, tah da – I have more cooling capacity and I’m a hero: we can add some more cabinets full of servers and they said it couldn’t be done. Shortly thereafter, the conversation morphed again and we are talking about PUE and an EPA report to congress and that maybe over half our data center energy budget is not transacting 1s and 0s. Can I reduce my data center energy expense for non-computing activity? Implement some airflow management and … uh … maybe we didn’t notice any major change.
Whereas airflow management can directly mitigate hot spots and directly release stranded cooling capacity, it does not directly raise efficiency and lower energy costs so much, but rather enables various tactics and technologies that then increase efficiency and decrease energy costs. It is no longer news that airflow management provides the foundation for harvesting savings from reducing cooling unit fan speeds, raising chiller temperatures and accessing more free cooling hours. However, the paths for connecting airflow management to these results are not always so clear; hence, we can cover some basic concepts for managing airflow management. Today, I will cover some aspects of how we did this in my data center experimental lab, and next time I will cover some more general principles, useful tips and examples of how I have seen other people do it.
In the Lab: Managing Airflow Management
While my space was a lab, it functioned mechanically like a data center, which was the point after all. It differed from a production data center in that we had a little more flexibility in being able to change from a raised floor to a slab floor and changing our cooling units from drawing return air from the room to being coupled to a suspended ceiling for a closed return air path. Since we were conducting experiments, it behooved us to be able to make quick changes to key variables such as load density, delivered temperature, degree and elements of containment, and character of load (e.g., ΔT). This capability meant we cheated with a few short cuts in some change-overs, but actual measured states were accurately representative of production data centers. With that, the biggest control difference in managing the mechanical plant without airflow management versus with airflow management is set point management. Without airflow management, data centers have historically managed set point similar to home and office comfort cooling – when the thermostat sees a temperature above the set point, the cooling system goes to work. With airflow management, our initial point of control is a supply set point. The other basic foundational element is that temperature and air volume are managed separately.1 Practically speaking then, we controlled airflow with pressure sensors and we controlled temperature with temperature sensors. While that may seem painfully obvious, there are millions of square feet of data center space in the world with only temperature controlling all aspects of the mechanical plant.
Since we were a lab, data ruled and we therefore had a few more data acquisition points than we might normally expect in a production data center. For example, we gathered temperature readings from all the following sensors every ten seconds:
Air handler exhaust, 4 per each unit
Air handler intake, 4 per each unit
Ambient, 4 outside locations
Ceiling, 9 within boundaries of each contained aisle
Floor tiles, 1 per each perforated tile or grate
Server inlet, 3 per cabinet, horizontally centered and vertically arrayed top, center, bottom
Server exhaust, 3 corresponding to inlet deployment
Utilizing the Data
While all these data points are useful in assessing results of experimental tests and in applying the management strategy of the four ΔT’s, as discussed in previous blog postings and white papers, the sensors recording server inlet temperatures at the top of each cabinet were all we used for automated temperature control. The objective was to get those sensors to read as close to whatever we set as our maximum allowable temperature without exceeding that target. For the purposes of most tests, we “cheated” and controlled that number with valves controlling the flow to our cooling coils, unless we were making a very dramatic change from one test to the next. In cases where we were collecting actual energy use numbers, however, then we would manipulate leaving water temperature (LWT) at our chiller. Our default settings were 65˚F water temperature at the chiller and 75˚F supply air temperature, which would result in those top server inlet temperatures ranging from 77-78˚F, before we did any manipulations of load, ΔT, or containment set-ups. Obviously, in tests with no containment or partial containment, those acceptable inlet temperatures could only be achieved with much lower supply temperatures. Since most of our testing revolved around the ASHRAE recommended inlet temperature envelope, we maintained maximum chiller efficiency with a LWT typically 65-67˚F.
The fine-tuning adjustments and associated energy savings came from managing our cooling unit airflow delivery volume. We monitored pressure differentials against a baseline barometric pressure measured outside the data center and, since we were collecting experimental data, we used a greater number of more precise (i.e., more expensive) static pressure sensors than we might typically see in a production data center. We used omni-directional probes with features that maintained curvature-induced errors at less than 1% of velocity and maintained angle of attack errors below 0.5%. The probes themselves came with +/- 0.5% accuracy and maximum 0.1% hysteresis. Given that our original operating pressure differential targets for the room with chimney cabinets were -0.015” +/-0.005” H2O column for the ceiling return plenum and +0.15” +/- 0.05” H2O for the supply plenum, such sensor precision would seem to have been a little over-kill. However, once we were up and operating with optimized hot and cold separation, we decided to find the minimal ΔP between hot and cold sides of containment that would still avoid any hot air re-circulation. We found we could routinely operate at +0.002” on the supply side and -0.001” on the return side, and our lab technician would periodically show off by holding that total ΔP down around 0.0015”. We arrayed our pressure sensors to reflect the conditions of a particular experiment, but a representative deployment would resemble something like the deployment described in Table 1.
|Data Center Test Lab Pressure Sensor Placement|
|System||Type of Sensor||Location||Function|
|Room||Omni-Directional||Dead zone under raised floor||Bulk static pressure under floor|
|Room||Omni-Directional||Dead zone in room||Bulk static pressure in room|
|Room||Omni-Directional||Dead zone above ceiling||Bulk static pressure in ceiling|
|Room||Omni-Directional||Dead zone in reference location||Reference barometric pressure outside data center|
|Test Cabinet||Omni-Directional||Bottom rear server 1||Ρ into which server fans blow|
|Test Cabinet||Omni-Directional||Bottom rear server 2||Ρ into which server fans blow|
|Test Cabinet||Omni-Directional||Bottom rear server 3||Ρ into which server fans blow|
|Test Cabinet||Omni-Directional||Bottom rear server 4||Ρ into which server fans blow|
|Test Cabinet||Omni-Directional||Dead zone top rear server 4||N/A with aisle containment|
|Test Cabinet||Omni-Directional||Dead zone chimney at ceiling||Ρ into which chimney exhausts|
|Test Cabinet||Omni-Directional||Front intake server 1||Ρ from which server fans pull|
|Test Cabinet||Omni-Directional||Front intake server 2||Ρ from which server fans pull|
|Test Cabinet||Omni-Directional||Front intake server 3||Ρ from which server fans pull|
|Test Cabinet||Omni-Directional||Front intake server 4||Ρ from which server fans pull|
Table 1: Pressure Sensor Placement with Chimney Cabinets
Fan Speed Adjustments
Integrating these various sensors into CRAH fan speed control algorithms might be a bit of a challenge for some; fortunately, this data is mostly informational for verifying room calibration and making finer tuned manual adjustments. Actual airflow volume, i.e., CRAH fan speed, was controlled by a simple algorithm to maintain a 2˚F ΔT between the bottom server inlet temperature and the top server inlet temperature. The actual functionality was slightly more complicated than that, since we had multiple rows of multiple cabinets at varying ranges of load densities, so we ran simple averages and got warning alerts for any statistical outliers. Therefore, when that ΔT increased above 2˚F, our CRAH fans were ramped up and when that ΔT approached zero, our CRAH fans decelerated. While the significant quantity of temperature and pressure sensors not used in these algorithms played important roles in reporting on the effectiveness of different hardware and design configurations tested, they also provided a path to double checking how the automated controls were functioning. For example, we did not understand at first how quickly and accurately the room would self-calibrate after making significant changes, such as swapping out a couple 5kW cabinets for a couple 30kW cabinets. Therefore, since we knew load data and we measured ΔT through the loads we could calculate the required airflow and then compare that to the CRAH units’ actual output. While our lab technician was uncannily good at making macro adjustments that came out close enough for the automated controls to tie it all together, it was still re-assuring to know there was a simple reality check. Obviously, if this had been a production data center, some form of redundancy and alerts would have been in order, but it sufficed for a lab.
Trust the Process
Unless we were specifically testing some obviously sloppy data center discipline, our airflow management at the floor, cabinet, row, and room was good enough that we would keep 2˚F as the default value for the maximum temperature variation in the cold aisle and at the server air intakes. We had a very robust Labview data acquisition tool that we ended up using to manage our CRAH controllers, after a, uh, slight firmware incursion (euphemism for “hacking”). Most cooling equipment and building management systems come fully enabled to do the same thing without jeopardizing warranty. Regardless, the model for data center temperature control from my lab experience is quite simple:
- Set your maximum allowable server inlet temperature
- Determine maximum temperature variance you can maintain on the supply/server inlet side with your best effort at airflow management.
- Set your supply temperature based on subtracting #2 from #1.
- Control your cooling unit fans to maintain inlet ΔT between #2 and zero.
- Develop plans and projects to further improve airflow management to reduce #2.
1It is actually a pretty gross over-simplification to say that temperature and air volume are controlled separately. The data center is, in effect, an eco-system in which everything is connected. However, prior to the science of airflow management, cooling capacity was generally a single term factor, such as tons or BTU/H. In reality, the actual cooling capacity of any piece of equipment can vary greatly based on the effectiveness of airflow management and the type of IT load. For example, the same cooling unit with a 37 ton capacity for cooling servers with a 20˚F ΔT would have a 55 ton capacity cooling servers with a 35˚F ΔT, assuming the airflow management system captured the full ΔT.
Data Center Consultant
Let's keep in touch!
Cooling Capacity Factor (CCF) Reveals Data Center Savings
Learn the importance of calculating your computer room’s CCF by downloading our free Cooling Capacity Factor white paper.