Dealing with Data Center Resiliency Challenges10 min read
Today’s data center is an intricate machine helping business push forward in a very digital economy. In fact, organizations are directly enabled by the capabilities of the IT and data center environments. So, what happens when it all fails? How much would an outage cost you? Most of all, just how resilient is your data center ecosystem? Ponemon Institute recently released the results of the latest Cost of Data Center Outages study. Previously published in 2010 and 2013, the purpose of this third study is to continue to analyze the cost behavior of unplanned data center outages. According to the new study, the average cost of a data center outage has steadily increased from $505,502 in 2010 to $740,357 today (or a 38 percent net change).
Throughout their research of 63 data center environments, the study found that:
- The cost of downtime has increased 38 percent since the first study in 2010.
- Downtime costs for the most data center-dependent businesses are rising faster than average.
- Maximum downtime costs increased 32 percent since 2013 and 81 percent since 2010.
- Maximum downtime costs for 2016 are $2,409,991.
With that in mind – let’s look at ways your organization can deal with data center resiliency and how you can improve uptime:
- Data center outages. The data center is not a perfect entity. In fact, administrators must be aware that anything can happen at any time – for almost any reason. This holds true for new cloud technologies as well as ones that are just around the corner. Many organizations still view the cloud as a truly distributed model where multiple redundancies are built into maintain the highest uptime possible. Well, these organizations aren’t quite correct. No entity is 100% safe from some type of disaster or emergency. In fact, a powerful storm in June of 2012 knocked out an entire data center which was owned by Amazon. What was hosted in that data center? Amazon Web Services. All affected AWS businesses in that data center were effectively down. Cloud-centric companies like Instagram, Netflix, and Pinterest were all made production ineffective for over six hours. To paint a clearer picture, there was a recent conducted by the International Working Group on Cloud Computing Resiliency. This report showed that since 2007, more than 568 hours were logged as downtime between 13 major cloud carriers. This has, so far, cost the customer more than $72 million and counting.
- Forgetfulness. Another example of a massive data center and cloud outage happened when a few administrators forgot to renew a simple SSL certificate. Not only did this cause an initial failure of the cloud platform, it created a global – cascading – event taking down numerous other cloud-reliant systems. Who was the provider? Microsoft Azure. The very same Azure platform which had $15 billion pumped into its design and build out. Full availability wasn’t restored for 12 hours – and up to 24 hours for many others. About 52 other Microsoft services relying on the Azure platform experienced issues – this includes the Xbox Live network.
- Working with data center management to improve uptime. The bottom-up approach to data center design allows the architecture to happen at the infrastructure level and move through the entire logical and physical data center environment. With that level of visibility and system integration, data center operators are able to focus their attention on proactive activities and maintaining optimal performance. In working with advanced management systems – you’re capable of managing key resources like critical environmental variables. This means controlling multiple data centers, optimizing connections, utilizing virtual sensors and even maintaining mobile access. This type of management next-gen management creates the four necessary components to implementing the modern data center:
- A good management system will give you granular details around your entire data center model. And, it helps keep your environment a lot more reliable. This means working with cooling, power, and airflow to ensure a reliable environment.
- Performance. Data center management will give you a complete vision into how well your most critical resources are performing. And, it’ll help you identify issues before they become major problems. This means finding airflow challenges, places where you can improve density, and even optimal sensor placement.
- Sustainability. The better your management, the more sustainable your data center. This means lowering the cost of actually managing your data center. When you look at data center management, ensure that everything is accounted for. Lost efficiency gains will add up in the long run.
- Good management will not only give you details around your data center environmental environment, it’ll also help with security challenges. You can detect open cabinets, if something was turned off, improper access, and much more. A big part of keeping a data center resilient revolves around security.
The only way data center providers can continue to provide the high level of service that is being required by todays “on-demand” business is to leverage improved data center management techniques. To do so, data center managers must look at the data center from a bottom-up approach. This means designing around power, cooling, airflow, rack placement, and even security. When you take in these factors, you begin to chip away at real-world threats to data center resiliency.
CTO, MTM Technologies
Airflow Management Awareness Month
Free Informative webinars every Tuesday in June.