How to Break the CRAC Addiction: Making the Transition to Air Economizers12 min read
Contrary to all those motivational posters and social media clichés, every journey doesn’t actually start with the first step; rather, it starts with finding someone to make the trip with you. On the journey from data center CRAC addiction to free cooling, Mr. IT and Mr. Facility need to sign on to make this journey together, without either one carrying, pushing or pulling the other. As partners on this journey, they will still encounter their fair share of obstacles, but at least they won’t be self-inflicted. Invariably, when this is a solo flight, a significant investment of effort or expense is followed by a pronouncement that, “I can’t live with that because…” Once this partnership has been established, the best starting point is a realistic assessment of ICT equipment reliability tolerance followed by an assessment if the transition could be flipping a switch or would need to be phased in over some time. Following these major decisions, the team can work out details such as type of free cooling, airflow management implementation, and ROI analysis.
Establish ICT Equipment Reliability Thresholds
We all want maximum uptime from our ICT equipment, so nobody is going to sign up for free cooling if the price tag is loss of availability on some of our servers. But should we be so quick to serve up that knee-jerk reaction? For example, is it worth the $1000 for a replacement server if I’ve saved $100,000 on my data center cooling bill? Does my virtualization or redundancy protect my data and applications from outages or equipment failures? What really is at stake here? For perspective purposes, think about your current server reliability track record. Do you see 1% failures? .5% failures? In a data center with 500 servers and a 1% failure rate, a 10% increase in failures would increase 5 failures to 5.5 failures, or one additional server every other year. A 25% increase in failures, which on the surface seems astronomical, means one additional failure. So armed with that perspective, we can move on to utilizing the ASHRAE TC9.9 “X” factor to translate reliability thresholds into free cooling parameters. I have explained this process in more detail in a previous blog from November 2014 titled, “Understanding the Relationship between Uptime and Server Inlet Temperature,” and this methodology is thoroughly explained in the ASHRAE handbook titled, Thermal Guidelines for Data Processing Environments, 3rd edition, particularly in Appendix H and Appendix I. Suffice to say that the major server manufacturers have agreed on a scale of failure variation rates at different hours of operation at different temperatures, departing from a baseline of 68˚F, 24/7, all year, so that there will be reduction in failure rates at lower temperatures and increase in failure rates at higher temperatures.
Align IT and Facility Objectives
With an ICT equipment reliability target, an analysis of the weather data for the data center location will reveal how many hours fall into each of the plus or minus buckets and a calculation can be made for the impact on product reliability of not having any cooling other than the economizer. In Appendix H of the ASHRAE guideline, an example is developed for Chicago for Class A2 servers with no refrigerant mechanical cooling that showed an actual 1% improvement in quality, running the data center from a minimum of 59˚F up to whatever the hottest day of the year had to offer. So the first level of consideration is whether or not to build the data center with no water or refrigerant mechanical cooling. Today, a large proportion of new servers fall into Class A3, so the upper allowable temperature threshold extends from 95˚F to 104˚F; therefore the boundary for a chillerless data center is stretched further, though, for the time being, the “X” factors don’t change. On the other hand, if the analysis reveals that the temperature profile of a chillerless data center results in an unacceptable increase in equipment failure rates, the next level of analysis is to consider how much cooling needs to be added. Very likely it would not make sense to build a cooling capability to match or even exceed (i.e., N+1, N+2, etc.) the data center thermal load. After all, partial free cooling is available when the data center return air temperature exceeds the ambient outside air temperature. For example, if we have too many hours .in the 100 – 104˚ range, we do not need our cooling units to drop our 120˚F return air to something below 100˚; rather, if we drop the 100-104˚ ambient air a mere 5˚ by mixing with some volume of much cooler air, we hit our quality temperature target while having provisioned the data center with half or less the chiller capacity otherwise required.
Cold Turkey or Baby Steps
If we are adding data center capacity with a new building and all new ICT equipment, this can be a rather straightforward process. The capital savings from either eliminating or dramatically downsizing the design of some kind of water or refrigerant-based mechanical cooling plant combined with the associated savings from not having to run any of that power-intensive equipment, makes for a slam-dunk return-on-investment case for the economizer. However, if capacity is being added through expansion of an existing facility, the process gets a little more involved. There will not likely be a day where you flip a switch and go from 100% chiller to 100% free cooling. There will likely be a large proportion of Class A2 servers with allowable maximum temperatures at 95˚ and likely even some Class A1 servers with upper threshold at 90˚ (89.6˚). This equipment will reduce the number of available free cooling hours in most geographies. These restrictions mean we need to ascertain the benefits of a phased transition from CRACs to free cooling.
The elements we consider for a phased transition are the same we consider for a new design, we just need to consider their inter-dependence a lot more closely. The economizer capacity should be designed for the end cooling capacity goal. We should determine if we can start adding computing capacity without increasing the existing mechanical plant. The methodology here is the same as the partial free cooling methodology discussed earlier in this blog. If the ambient temperature is always lower than the data center return temperature, can we cool it enough by mixing with data center supply air to meet the cooling requirements of our legacy equipment? Another part of this assessment is determining how long it will take to complete a full technology refresh to replace legacy servers with Class A3 or Class A4 servers. That timetable in conjunction with the capital outlay for the economizer and the operating cost while still needing to run the legacy mechanical plant for some portion of the year will produce a delayed payback versus a new build with new ICT equipment, but ROI results should be acceptable to most business models. The only possible exception to this would be in some situation with a harsh environment wherein capacity expansion would need to include additional chiller capacity during the phase-out of legacy ICT equipment and possibly accompanied by an extraordinarily long technology refresh cycle.
Don’t Forget the Basics
Finally, the actual implementation of such a project would need to be accompanied by best practices of airflow management, resulting in the absolute minimum variation of server inlet temperature throughout the space. The difference between a minimum and maximum server inlet temperature at any particular moment effectively creates an approach temperature for the economizer, reducing the required ambient condition for free cooling. In addition, variations in server inlet temperatures could also reflect bypass, which reduces return air temperature and thereby reduces the partial free cooling capacity, resulting in higher operating expenses as well as possible need for more chiller capacity and associated precision cooler capacity – all of which reduces total ROI and extends the payback horizon. One last consideration for data centers that just cannot be opened up to the immediate outside environment for any number of reasons: there are air-to-air heat exchanger options that can be evaluated through this same process, with the most significant difference being an approach temperature ranging from 3-7˚, depending on technology and vendor that just means recalibrating all the benchmark points.
Data Center Consultant
Let’s keep in touch!
Airflow Management Awareness Month
Free Informative webinars every Tuesday in June.