Needed: a Six Sigma Datacenter

As usual there was a lot of discussion on cooling and energy efficiency at the yearly DatacenterDynamics conference in Amsterdam last week. Finding point solutions to be efficient and/or creating redundancy to circumvent possible technical risks. But is this the way to go to optimise a complex IT supply chain?

In a lot of industries statistical quality management methods are used to improve the quality of process outputs by identifying and removing the causes of defects (errors) and minimising variability in manufacturing and business processes. One of the more popular methods is Six Sigma which utilises the DMAIC phases Define, Measure, Analyse, Improve and Control to improve processes.

But when Eddie Desouza of Enlogic asked the audience (of one of the tracks at DatacenterDynamics) who was using the Six Sigma method to improve their datacenters only three people raised their hand out of hundred. Eddie Desouza was advocating the use of Six Sigma to improve the efficiency and the quality of a datacenter. He made the observation that datacenters do apply substantial upfront reliability analysis and invest in costly redundant systems, but rarely commit to data-driven continuous improvement philosophies. In other words focussing on fixing errors instead of focussing on optimising the chain by reducing unwanted variability and reducing the associated costs of poor quality.

He also, rightly, emphasised that datacenter operators should use a system approach instead of a component approach in optimising the datacenter. The internal datacenter supply chain is as strong as its weakest link and there is also the risk of sub-optimisation.

An example of the necessity to use a system approach and to use industry methods like Six Sigma can be found in a blog post of Alex Benik about “the sorry state of server utilization”. He refers to some reports from the past five years:

• A McKinsey study in 2008 pegging data-center utilization at roughly 6 percent.

• A Gartner report from 2012 putting industry wide utilization rate at 12 percent.

• An Accenture paper sampling a small number on Amazon EC2 machines finding 7 percent utilization over the course of a week.

• Charts and quote from Google, which show three-month average utilization rates for 20,000 server clusters. A typical cluster spent most of its time running between 20-40 percent of capacity, and the highest utilization cluster reaches such heights (about 75 percent) only because it’s doing batch work.

Or take a look from another source, the diagram below of the Green Grid:

 UnusedServers

Why is this overlooked? Why isn’t there a debate about this weak link, this huge under-utilisation of servers and as a result the huge energy wasting? Why focussing on cooling, UPS, etc. if we have this weak link in the datacenter?

As showed in another blog post, saving 1 unit power consumption in information processing saves us about 98 units in the upstream of the power supply chain (that is up to the power plant).

So it is very nice to have a discussion about the energy efficiency of datacenter facility components but what is it worth if you have this “sorry state of server utilisation” and that it isn’t noticed and/or that no action is taken on this? Eddie Desouza of Enlogic is right, datacenters need Six Sigma. It would help if datacenter operators would embrace a system approach. Focussing on the complete internal  datacenter supply chain instead of a component approach, and using statistical quality management methods to improve efficiency and quality as in other industries.

Leave a comment