What Is the Difference Between ASHRAE’s Recommended and Allowable Data Center Environmental Limits? – Part 119 min read

by | Sep 18, 2019 | Blog

Despite the ubiquity of references to ASHRAE TC9.9’s guidelines on data center temperatures, there remain questions about the difference between the recommended limits and the allowable limits. The easy answer is that the recommended envelope for server inlet temperatures is 18-27˚C (64.4 – 80.6˚F) and the allowable envelope is wider, depending on the server class. But in actual application, what is the difference between a recommendation and something that is allowable? How do you know which to apply? For example, it may be allowable to cross the street in the cross walk, but it is still recommended that you look both ways. More seriously, it may be allowable to play poker with your buddies, but it may not be recommended to do that on an anniversary or your spouse’s birthday. The stakes can get even higher when we consider it is allowable to drive 85 mph on the toll road between Austin and San Antonio, but that speed is not recommended at night when the wild hogs are moving around or during heavy rain. So in the data center, what is at stake and why would we consider operating within the recommended limits rather than the allowable limits?

My careful reader should have noticed my bias shine through in the phrasing of that last question. All too often (like in always) the question that is being debated or explored has to do with why, when and how much should we allow our data center to encroach into the allowable limits. That is no longer a relevant question: I am going to save a truck load of cash by operating outside the recommended and inside the allowable limits. So the real question should be: why do I want to waste thousands and even millions of dollars of both capital and operating budgets by keeping my data center within the recommended limits?

The first, and probably least understood difference between the recommended limits and the allowable limits is that the recommended envelope is basically an arbitrary number arrived at through a mostly consensus-reaching process among the TC9.9 committee members. The allowable range is a much less arbitrary number closely approximating the user documentation specifications from the IT equipment OEMs. Adding to the confusion is the history of IT equipment sales people telling their customers they should operate their equipment between 64˚F and 80˚F, while handing their customer a specification brochure indicating a warranty temperature range of 50˚F – 95˚F. Imagine a car salesman telling you the car needs an oil change every 3000 miles, but the owner’s manual indicates an oil change is required every 10,000 miles! Or consider a home furnace salesman saying you need to have a service technician come out every month for a check-up/tune-up while the user manual tells us we should have a check-up once a year. It almost makes sense for the car salesman and furnace salesman to tout the need for more services, which would ostensibly be supplied by their companies, but we do not have IT equipment OEMs in the business of selling either air conditioning units or electricity, so why in the world would they make such a fuss about presenting their products in such an unfavorable light? Dell Computers, for one, some 5-6 years ago, finally saw the silliness of this and started touting their new servers as not requiring any air conditioning in over 90% of the data centers in the United States, assuming implementation of decent airflow management.

ASHRAE Guidelines for Server Inlet Temperatures

ClassRecommended ˚CRecommended ˚FAllowable ˚FAllowable ˚F
A118 – 2764.4 – 80.615 – 3259 – 89.6
A218 – 2764.4 – 80.610 – 3550 – 95
A318 – 2764.4 – 80.65 – 4041 – 104
A418 – 2764.4 – 80.65 – -4541 – 113

Table 1: Temperature Ranges from ASHRAE’s Thermal Guidelines for Data Processing Environments, 4th edition

The recommended limits and allowable limits are similar to the degree both are presented as “guidelines,” without the enforceability of code or stature of standards, regardless of the industry’s acceptance of them as default standards. Furthermore, they are referenced (albeit as “should” parameters rather than “shall” parameters) in other data center design and operations standards. On the other hand, the allowable limits have an enforceability element lacking in the recommended limit: IT equipment OEM warranty limits. The allowable limits therefore bring us to a slightly reductive reasoning definition of terms that establishes this enforceability. With only a handful of notable exceptions, server manufacturers are still not labeling their equipment as Class A1 or Class A2 or Class A3. Instead, they will publish minimum and maximum temperatures such as 50 – 95˚F or 10 – 40˚C and then we determine if that’s a Class A1 or Class A2 or whatever, based on the widest allowable range that fits within the user manual specifications. For simplicity’s sake, we can disregard Class A1 servers, unless we have IT equipment supply chain elements located in any of the areas illustrated in Figure 1 below.

Figure 1: Procurement Sources for Class A1 Servers

Another difference between the recommended envelope and allowable envelope has to do with server reliability. Superficially, IT equipment operating within the recommended envelope is going to last longer and have fewer failures than IT equipment operating in the allowable envelope. Unfortunately, the reality is not quite so simple. ASHRAE’s first attempt to clarify the difference between recommended and allowable came with the 2008 update in which they said it was allowable to operate in the allowable range for short periods of time. Since we had just about as many definitions of “short period of time” as we had readers of the guidelines, they tried to further clarify the definition of “short period of time” with the 2011 update. The answer was basically that each data center operator/manager had to figure it out for themselves by using ASHRAE’s X-factor to determine an estimated forecast for server failures if they operated within the recommended limits versus the allowable limits. This clarification is actually a big help, if you have the discipline to go through the procedure correctly. First, the X factor is based on 20˚C (68˚F) server inlet temperature, so the calculation output needs to be adjusted by some actual set point within the recommended 18-27˚C recommended range, before making a comparison to the hours running within the allowable range, for example 5 – 40˚C for Class A3 servers. The calculation then requires tabulating annual forecasted hours between 17 – 27˚C and between 5 – 40˚C and then calculating the reduced (below 20˚C) and increased (above 20˚C) failure rates and then adding or subtracting that from your existing failure rate. In their concept launch white paper they demonstrated that in Chicago a data center would reduce failures if they operated within the allowable range versus holding steady at 68˚F all year. Even if you were to have a 10% increase in failures and you normally killed off five servers a year, the 10% increase comes to an additional 0.2 failures per year. With a three year technology refresh cycle, you will likely swap out your equipment before it fails. In Part 2 of this series, I will explore various example scenarios in much greater detail.

Microprocessor transactional performance is subject to influence by temperature, so there should be a difference between data centers running within the recommended envelope versus the allowable envelope based on how much work they can get done. While this is true, the temperatures that diminish performance are extreme and they can be offset to some extent, as were the failure predictions, by hours running at the lower end of the allowable limits in free cooling applications. I will explore these relationships in more detail in a subsequent part of this series.

As temperatures rise, server and cooling unit fan speeds accelerate resulting in more fan noise. Therefore, we would expect a data center operating in the recommended envelope to be quieter than a data center operating within one of the allowable envelopes. Theoretically, that makes good sense, unless of course, you totally eliminate the CRAC/CRAH fans in a data center operating within the allowable limits with no air conditioning. I will explore some of these implications in more detail later in this series as well.

The biggest difference between recommended temperature limits and allowable temperature limits is the cost of designing, building, and operating the data center. I have covered various aspects of these differences in many previous papers, but will wrap this series up with a quick review summary. Otherwise, there are differences in terms of IT equipment failure rates, IT equipment performance and data center noise, all of which will be covered in subsequent releases in this space. As a preview, and as an open invitation for you to come back and check it out, I will say that many readers will have their preconceptions challenged – have you considered that the primary value of the recommended environmental envelope for data centers may be to encourage you to buy equipment you do not need and to consume electricity that you do not need?

This is part 1 of a two part series on ASHRAE’s recommended vs allowable data center environmental limits. To read part 2, click here.

Airflow Management Awareness Month 2019

Did you miss this year’s live webinars? Watch them on-demand now!

Ian Seaton

Ian Seaton

Data Center Consultant

Let's keep in touch!

2 Comments

  1. Anonymous

    Well maybe you have the juices flowing of the owners of DCs. But you probably have the juices flowing of the local fire department as well when one of those antique dust buckets heats up and flairs over. How much $$$ would you have to save to pay for one dead hall or site. Loss revenue, loss of confident s, loss of clients. Very risky game to play.

    Reply
  2. Anonymous

    I love all the math and formulas you provide in your blogs, but “10% increase in normal of 5 failures = 0.2 additional failures”? I’m still trying to work out that math!
    My boss loves car analogies, so I counter with this. The car maker can recommend oil changes every 10,000 miles because the manufacturer only warranties the car for 3 years/36,000 miles. Less frequent oil changes won’t likely cause failure before the warranty runs out. Once out of warranty, the manufacturer is highly motivated to get you back in the showroom and buy another car. So if you are the type that buys a new car as soon as your current warranty expires, then your analogy is spot on. But I haven’t found too many personal financial planners that endorse this strategy as a sustainable path to early retirement. So why should I believe that server manufacturers don’t have the same motivation? They only need to skate through their 3 year warranty period and then every day after day 1096 is simply lost revenue for them. Just like our personal cars, how many of us in the real world live the lifestyle that can afford new servers every 3 years? I’m not suggesting a run to fail strategy is wise, but in my 30 year career I have seen that approach more than a strict 3 year refresh. Sure, I’d love the new car smell of a 2020 model, but I bet I can get some really productive, COST EFFECTIVE, use out of my 2017 model still. So maybe I’ll pony up the extra $ and change my oil a little more frequently if it might keep it runnin’ a little longer. Not everyone has the IT and DC infrastructure of Google, Facebook, etc where server failures are like light bulb failures (sorry to change up the analogy). We all know major companies still don’t have application resiliency and that 1 extra failure a year could be thousands of man-hours in lost productivity, lost sales or revenue, lost data, and extra IT costs for new hardware and recovery services. I think a lot of resistance by IT to raise temps is simply “old school thinking”, but I also think that savvy IT is cautiously evaluating “guidelines” that are so heavily influenced and endorsed by those with their hand in IT’s wallet while flashing the ASHRAE banner. Its not a simple answer, and the savings on the electric bill doesn’t always cover the IT shortfall. Keep the blogs coming, always an interesting and thought provoking read!

    Reply

Submit a Comment

Your email address will not be published.

Subscribe to our Blog

Archives

Airflow Management Awareness Month 2019

Did you miss this year’s live webinars? Watch them on-demand now!

Cooling Capacity Factor (CCF) Reveals Data Center Savings

Learn the importance of calculating your computer room’s CCF by downloading our free Cooling Capacity Factor white paper.

Pin It on Pinterest