Predicting Thermal Issues In Data Centers9 min read
Note: This post originally appeared as “Expecting the Unexpected: Analyzing a Cooling Failure” on the Future Facilities blog, authored by Matt Green, an Applications Engineer at Future Facilities. It is republished here with permission.
Thermal issues in data centers are a common occurrence, and methods for discovering these issues are normally straightforward – servers send out warning messages, monitoring alarms go off, or employees complain about warm temperatures. Once known, the necessary steps can be taken to resolve these issues. However, what happens when a potential thermal issue only occurs during critical operations? How does one go about discovering these issues before they happen?
Data centers are normally built with redundancy in their cooling systems, but when the redundancy is lost, the cooling distribution inside the data center changes drastically. Servers that were fine during normal operations could now be overheating under critical operations. In order to find these potential issues, it is necessary to test the cooling system under various cooling failure scenarios. Because of the impracticality of physically testing cooling failures on a live data center, we look to computational fluid dynamics (CFD) simulations to model these scenarios.
Normal CFD simulations are steady state models, which take small fluctuations in time and average them to get a snapshot of the data center’s normal operations. To accurately model a cooling failure, time must be taken into account. Thus, all time varying phenomena that occur during a cooling failure need to be considered.
There are two time-dependent aspects that need to be considered when looking at an Air Handling Unit (AHU), the loss of power to the fans and the loss of power to the chilled water pumps. When fan power is lost, the fans will still contain considerable rotational inertia, and thus take time to come to a complete stop. When the chilled water pumps fail, some water remains inside the AHU cooling coils. Initially, the remaining water’s thermal inertia resists changes in temperature; but as time progresses, the water temperature eventually matches the temperature of the air passing over the coils. Depending on the type of power redundancy system installed in the data center, one or both failure scenarios could be applicable.
Servers, like AHUs, also contain thermal inertia. Unlike an AHU, the thermal inertia of a server is much less than that of water, mainly comprising of their chassis and components. However, the server’s thermal inertia is enough to resist temperature changes for a few seconds – which may be the few seconds needed for cooling systems to come back online.
Lastly, thermal inertia is also stored in the chilled water loop. If a chiller was to fail, the water in the chilled water loop would gradually increase in temperature. Any data center dependent on this chilled water loop would gradually see increases in AHU supply temperatures. Depending on the amount of water in the chilled water loop, overheating issues may occur in minutes or hours.
The construction of a data center directly affects which types of cooling failures will produce the most thermal issues. For example, a data center with In-Row Coolers (IRC) is at higher risk when AHU fan failures occur than a data center with perimeter units.
Once the IRC fans have stopped, airflow becomes stagnant in the cold aisles. Assuming the servers are still active, they will continue to require airflow. Now the only thing in the room keeping the servers cool is the stagnant air in the cold aisles. In this situation, the Data Center has only seconds before servers overheat. Using CFD to simulate these different failure scenarios will allow facility designers to model the minimum time allowed before AHU fans need to be restarted; and likewise, be able to design the appropriate power redundancy system to meet that time requirement.
Let's keep in touch!.
Airflow Management Awareness Month
Free Informative webinars every Tuesday in June.