Data Center 2.0 – The Sustainable Data Center

DC20_SustainableDataCenter
Currently busy with the final steps to get the forthcoming book ‘Data Center 2.0 – The Sustainable Data Center’ (ISBN 978-1499224689) published at the beginning of the summer.

Some quotes from the book:

“A data center is a very peculiar and special place. It is the place where different worlds meet each other. A place where organizational (and individual) information needs and demands are translated in bits and bytes that are subsequently translated in electrons that are moved around the world. It is the place where the business, IT and energy world come together. Jointly they form a jigsaw puzzle of stakeholders with different and sometimes conflicting interests and objectives that are hard to manage and to control.Data Center 2.0

Given the great potential of Information Technology to transform today’s society into one characterised by sustainability what is the position of data centers?

……..

The data center is the place were it all comes together: energy, IT and societal demands and needs.

…….

A sustainable data center should be environmentally viable, economically equitable, and socially bearable. To become sustainable, the data center industry must free itself from the shackles of 19th century based ideas and concepts of production. They are too simple for our 21th century world.

The combination of service-dominant logic and cradle-to-cradle makes it possible to create a sustainability data center industry.

Creating sustainable data centers is not a technical problem but an economic problem to be solved.”

The book takes a conceptual approach on the subject of data centers and sustainability. It offers at least multiple views and aspects on sustainable data centers to allow readers to gain a better understanding and provoke thoughts on how to create sustainable data centers.

The book has already received endorsements of Paul-Francois Cattier Global Senior, Vice President Data Center of Schneider Electric and John Post, Managing Director of Foundation  Green IT Amsterdam region.

Table of contents

1 Prologue
2 Signs Of The Time
3 Data Centers, 21th Century Factories
4 Data Centers A Critical Infrastructure
5 Data Centers And The IT Supply Chain
6 The Core Processes Of A Data Center
7 Externalities
8 A Look At Data Center Management
9 Data Center Analysis
10 Data Center Monitoring and Control
11 The Willingness To Change
12 On The Move: Data Center 1.5
13 IT Is Transforming Now!
14 Dominant Logic Under Pressure
15 Away From The Dominant Logic
16 A New Industrial Model
17 Data Center 2.0

Data centers and the vulnerable Texas power grid

As we all know the quality and availability of the data center stands or falls with the quality and availability of the power supply to the data center. But in Texas, a state with data centers of several notable IT companies, including WordPress.Com, Cisco, Rackspace and Host Gator, the power grid is on the edge of collapse.

Just like last year, the Texas power grid has struggled to keep up with demand during sweltering summer days (see Texas escaped rolling blackouts: Data centers and the power grid interdependency)

For Texas, it has been said that demand is growing faster than new power supplies and at the same time the wholesale power pricing environment doesn’t support expansion. The peak price for wholesale electricity is state-regulated. And although the Texas Public Utility Commission approved raising the cap on the peak-demand price from $3,000 per megawatt hour to $4,500/MWh starting August 1 this gives little incentive to build new power plants (see Not enough even higher price for electricity urged for Texas).

New power plants are very much-needed. Texas has a goal of having about 13 percent more power available than consumers need (also known as the “reserve margin”). By setting a proper reserve margin you can reduce the chance of a strained grid that will lead to blackouts. Besides the fact that Texas is the only state that isn’t meeting reserve margin goals, the reserve margin is also shrinking. The forecast is that by 2014 the reserve margin could be below 10 percent and in 2015 below 7 percent.

USA reserve margins source: EIA

The current power grid issues in Texas are an example that as a customer of cloud computing and/or data center services but also a data center provider you must have a good understanding of the power grid to appreciate the risks that are at stake in terms of resiliency and business continuity.

Power grid crisis: Japanese data centers hit again?

It looks like the feared scenario of last year, that all of Japan’s nuclear power plants would be offline within about one year, is becoming true.

Currently only two of 54 reactors are still operating. Tokyo Electric’s last active reactor is set to go into maintenance on March 26, while Hokkaido Electric’s Tomari No. 3 unit is scheduled to be closed for routine checks in late April or early May. The reactors need to meet new safety checks and No reactor can restart until it passes computer-simulated “stress tests” to confirm it can withstand earthquakes and tsunamis on the scale that wrecked Tepco’s Fukushima Daiichi plant. Then cabinet ministers need to sign off and local governments, by custom although not by law, need to agree.

“A tight supply-demand balance (of electricity) does not affect our judgment on nuclear safety and we are in the process of making that judgment,” Trade Minister Yukio Edano, who holds the energy portfolio, told parliament.

The prospect of being without nuclear power has raised fears of another year with forced power rationing and temporary blackouts in the summer, when air conditioning puts extra strains on supply. It is estimated that the nationwide power supply could fall 7 percent below demand if no reactors are online. Since it is difficult to raise overall supply capacity drastically in the coming months, the government is asking power utilities to come up with a variety of pricing incentives to encourage large users to reduce peak-hour demand, a trade ministry official said.

Infrarati link

Asahi Shimbub link

Der Spiegel link

Nuclear Power Plants Japan

Japan is shifting its energy mix. Nuclear power previously accounting for 30% of Japan’s power output. Japanese utilities now have ramped up imports and use of coal, oil, and natural gas, while industrial consumers have been more reliant on diesel generators to guarantee reliable energy supply.  The shutdown of nearly all nuclear reactors has forced energy providers to reopen decommissioned oil and gas power plants. Some companies are running their plants partly with their own generators. Overall it is still difficult to meet demand.

Kansai Electric Power Co., which has shut down all the nuclear reactors under its jurisdiction, had difficulties supplying enough power to meet demand. The utility was forced to crank up its remaining power plants, including obsolete thermal power stations, to maximum output to meet winter demand.

Also malfunctions due to aging were frequently reported at such power stations. More than 10 plants across Japan had to be quickly shut down due to emergencies this winter. The trouble at the Shin-Oita thermal power plant of Kyushu Electric Power Co. had a particularly serious impact. The utility barely managed to avoid implementing blackouts in its service area by hooking up to power supplies from five electric utilities in western Japan and Tepco.

Imports of liquefied natural gas (LNG) by Japan’s 10 regional utilities have increased 39 percent in January, relative to the year prior, while imports of fuel oil and crude oil both nearly tripled. This gives is a growing concern over whether Japan can ensure a stable supply of the fuel. While the country has a 200-day crude oil reserve, it has only enough LNG for two to three weeks of power generation. As a result of soaring power generation costs, Tepco will raise its electricity charges from April. If the dependence on thermal power generation continues, other electric utilities could very likely follow suit.

The Breaktrough link 1

The Breaktrough link 2

Daily Yomiuri link

The current energy crisis has also an profound effect on the carbon emission. Japan, the world’s fifth-largest carbon emitter, is backtracking on the introduction of a carbon price. A move by the Japanese government to put a price on carbon at this point would be economically and politically impossible in Japan today, according to senior Japanese foreign affairs officials in Tokyo. The commitment made earlier with the election of the prime minister  in 2009 to reduce emissions by 25 per cent is no longer realistic.

The Australian link

Last year summer all facilities that consume more than 500kWh were ordered to reduce their peak power usage by 15 percent from a year earlier. Data centers operators have fought the cap and the idea of rolling blackouts with some success. It is still uncertain if the Japan Data Center Council (JDCC) will be as successful as last year. What is certain is that much pressure will be put on data centers in trying to find ways to cope with long-term power shortages and the price hikes that result.

Infarati link

Amazon post mortem on Dublin DC outage reconfirms domino theory

As promised Amazon released a post mortem report  on the data center outage in Dublin. It reconfirmed the domino theory: if one availability zone come under the influence of errors and outages, then other availability zone would follow in a domino effect.

Amazon stated, as mentioned in an earlier blog entry about the Dublin outage, that the utility provider now believes it was not a lightning strike that brought down the 110kV 10 megawatt transformer. This outage was the start of a row of incidents that finaly would bring the Dublin data center on its knees.

“With no utility power, and backup generators for a large portion of this Availability Zone disabled, there was insufficient power for all of the servers in the Availability Zone to continue operating. Uninterruptable Power Supplies (UPSs) that provide a short period of battery power quickly drained and we lost power to almost all of the EC2 instances and 58% of the EBS volumes in that Availability Zone. We also lost power to the EC2 networking gear that connects this Availability Zone to the Internet and connects this Availability Zone to the other Availability Zones in the Region.”

In 24 minutes Amazon “were seeing launch delays and API errors in all (emphasis by Infrarati) EU West Availability Zones.” The reason of this was “The management servers which receive requests continued to route requests to management servers in the affected Availability Zone. Because the management servers in the affected Availability Zone were inaccessible, requests routed to those servers failed. Second, the EC2 management servers receiving requests were continuing to accept RunInstances requests targeted at the impacted Availability Zone. Rather than failing these requests immediately, they were queued and our management servers attempted to process them.”

“Fairly quickly, a large number of these requests began to queue up and we overloaded the management servers receiving requests, which were waiting for these queued requests to complete. The combination of these two factors caused long delays in launching instances and higher error rates for the EU West EC2 APIs.” Later on Amazon was able to restore power to enough of the network services that they were able to re-connect. However the problem they found was that their database cluster was in an unstable condition. The last blow was that “Separately, and independent from issues emanating from the power disruption, we discovered an error in the EBS software that cleans up unused storage for snapshots after customers have deleted an EBS snapshot.”

In their description of actions to prevent recurrence Amazon stated that “Over the last few months, we have been developing further isolation of EC2 control plane components (i.e. the APIs) to eliminate possible latency or failure in one Availability Zone from impacting our ability to process calls to other Availability Zones.” (emphasis by Infrarati).

The Dublin incident shows that Amazon is still developing and improving the isolation between availability zones. Services in one zone are not yet safeguarded from incidents in other availability zones. That is a ‘must know’ for the customer instead of a ‘nice to know’.

Texas escaped rolling blackouts: Data centers and the power grid interdependency

In Texas, a state with data centers of several notable IT companies, including WordPress.Com, Cisco, Rackspace and Host Gator, the power grid company ERCOT have been working around the clock to keep the electricity flowing, and to avoid rolling blackouts as power demand reaches record levels.

According to The Wallstreet Journal for the second year in a row, ERCOT, underestimated summer demand in its forecasts. ERCOT’s forecasts are based on an average of the past 10 summers, but the past two years have been unusually hot, and this is pushing up energy use. With almost 40 consecutive days of temperatures of more than 37 Celsius (100 degrees F) it was the hottest start to August in Texas history. The drought in the southern U.S. is exceptional as can be seen in the map below, see also the 12-week animation of the U.S. drought monitor.

Drought monitoring

Texas has its own power grid, regulated and managed by  Electric Reliability Council of Texas (ERCOT). The Texas Interconnect supplies its own energy and is completely independent of the Eastern and Western Interconnects, which means that Texas can’t get help from other places when it runs short of power.

On the second of august  ERCOT even put out a notice  saying the state’s reserve levels dropped below 2,300 megawatts, putting into effect an Energy Emergency Alert level 1.“We are requesting that consumers and businesses reduce their electricity use during peak electricity hours from 3 to 7 p.m. today, particularly between 4 and 5 p.m. when we expect to hit another peak demand record,” said Kent Saathoff, vice president of system planning and operations. “We do not know at this time if additional emergency steps will be needed.” ERCOT only has peak capacity of 73,000 megawatts this time of year, and about 3,000 megawatts is offline for repairs at any given time. ERCOT recorded an all-time peak demand for electricity: 68,296 megawatts. ERCOT thus narrowly avoided instituting rolling blackouts.

Texas energy demand

According to an Aug. 2 blog article by Elizabeth Souder of the Dallas Morning News, “The high temperatures also caused about 20 power plants to stop working, including at least one coal-fired plant and natural gas plants.” Souder noted that a spokesman for ERCOT, “said such outages aren’t unusual in the hot summer…”

The demand for energy sent prices sky high, topping out at $2,500 per megawatt-hour on Friday afternoon, more than 50 times the on-peak wholesale average, according to the U.S. Energy Information Administration.

ERCOT energy prices

The power plants are also in a different way under siege. The drought shows a structural problem with the U.S. energy sector: it needs a lot of water to operate. Power plants account for 49 percent of the nation’s water withdrawals (according to the U.S. Geological Survey). Levels of “extreme” and “exceptional” drought grew to 94.27 percent of the state of Texas. The drought and triple-digit temperatures (F) have broken numerous records and already left the southern Plains and Mississippi Valley struggling to meet demand for power and water. A prolonged drought such as in Texas can force power plants to shut down because their supply of circulating cooling water runs out or the cooling water is not cool enough (which happens in  2007 when several power plants had to shut down or run at a lower capacity because there was not enough water. As showed in a study from the University of Texas at Austin alternative cooling technologies, such as cooling towers and hybrid wet–dry or dry cooling, present opportunities to reduce water diversions.

Although we didn’t hear much from the data center operators about the current threat to the power grid, the Texas case shows very clearly the interdependency between data centers as huge energy consumers and the power grid, the water distribution systems and the weather and climate conditions.

Data centers are part of a complex electrical power value chain. People are mostly not aware of this value chain and the energy losses in this value chain. As a customer of cloud computing and/or data center services but also a data center provider you must have a good understanding of the power grid to appreciate the risks that are at stake in terms of resiliency and business continuity.  The power grid, and water distribution systems are struggling to survive a record breaking drought across the southern United States. That is also a wake up call for data center users and providers to rethink the energy efficiency and the energy consumption of their data centers.

Saving a kiloWatt at the end of this power value chain saves a lot of energy. This can offer some relief to the current power grid. It can be shown that by saving 1 unit power consumption in information processing saves us about 98 units in the upstream of the power value chain. See the blog entry Following the data center energy track 

Amazon and Microsoft in Dublin down, resiliency of cloud computing

A lightning strike in Dublin, Ireland, has caused downtime for many sites using Amazon’s EC2 cloud computing platform, as well as users of Microsoft’s BPOS.
Amazon said that lightning struck a transformer near its data center, causing an explosion and fire that knocked out utility service and left it unable to start its generators, resulting in a total power outage.
Some quotes from the Amazon dashboard (Amazon Elastic Compute Cloud (Ireland)):

11:13 AM PDT We are investigating connectivity issues in the EU-WEST-1 region.

3:01 PM PDT A quick update on what we know so far about the event. What we have is preliminary, but we want to share it with you. We understand at this point that a lighting strike hit a transformer from a utility provider to one of our Availability Zones in Dublin, sparking an explosion and fire. Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators. The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them. Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization. We’ve now restored power to the Availability Zone and are bringing EC2 instances up. We’ll be carefully reviewing the isolation that exists between the control system and other components. The event began at 10:41 AM PDT with instances beginning to recover at 1:47 PM PDT.

Notice the 30 minutes difference between the first issue message on the dashboard (11:13) and the statement about when the event began, 10:41

11:04 PM PDT We know many of you are anxiously waiting for your instances and volumes to become available and we want to give you more detail on why the recovery of the remaining instances and volumes is taking so long. Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process. We’ve been able to restore EC2 instances without attached EBS volumes, as well as some EC2 instances with attached EBS volumes. We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone. While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours (emphasis made by Infrarati) until the process is completed. In some cases EC2 instances or EBS servers lost power before writes to their volumes were completely consistent. Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service. We will contact those customers with information about their recovery snapshot.

Remarkably as stated in the Irish Timesan spokes woman of  the Electricity Supply Board (ESB), Ireland’s premier electricity utility, said the incident occurred at 6.15 PM on Sunday and caused a power outage in the area for about an hour. However, she said power to Amazon was interrupted for “less than a second before an automatic supply restoration kicked in.”

Microsoft doesn’t use a public dashboard but their twitter feed  stated “on Sunday 7 august 23:30 CET Europe data center power issue affects access to #bpos“. Then 4 hours later there was the tweet “#BPOS services are back online for EMEA customers“. A pity that there isn’t an explanation how also their data center went down. Is it the same cause as that brought the Amazon data center down?

The idea on cloud computing is basically that the offered services are location independent. The customer doesn’t have to worry and doesn’t have to know on which location the services are produced. He even doesn’t have to know how the services are provided (the inner working of the provided services).
The incident in Dublin shows that at the current moment this assumptions are wrong. As a customer of cloud computing services you still have to have a good understanding of the location and working of the provided services to get a good understanding of the risks that are at stake in terms of resiliency and business continuity. Only then you can make the proper choices in which way cloud computing services can help your organization or business without business continuity surprises. Proper risk management when using cloud computing services deserves better attention.

UPDATE 10 august

Now three days later Amazon, as showed on their dashboard, is still struggling with recovery. In Informationweek there is interesting article about the complexity of fail over design according to this article

It’s still possible that having the ability to fail-over to a second availability zone within the data center would have saved a customer’s system. Availability zones within an Amazon data center typically have different sources of power and telecommunications, allowing one to fail and others to pick up parts of its load. But not everyone has signed up for failover service to a second zone, and Amazon spokesman Drew Herdener declined to say whether secondary zones remained available in Dublin after the primary zone outage.(emphasis made by Infrarati)

UPDATE 11 august

ESB Networks confirms that it suffered a failure in a 110kV transformer in City West, Dublin at 6:16 p.m. local time on Sunday, August 7. The Irish Times revealed some new information about the outage.

The cause of this failure is still being investigated at this time, but our initial assessment of lightning as the cause has now been ruled out,” a statement from ESB Networks said. The ESB also said on Monday, that Amazon was interrupted for “less than a second before an automatic supply restoration kicked in”. Yesterday the ESB confirmed that Amazon was one of about 100 customers affected by the ongoing service interruption. “This initial supply disruption lasted for approximately one hour as ESB Networks worked to restore supply. There was an ongoing partial outage in the area until 11pm. The interruption affected about 100 customers in the Citywest area, including Amazon and a number of other data centers.

A second Amazon data centre in south Dublin experienced a “voltage dip which lasted for less than one second”, the ESB said yesterday. However, this data centre was not directly affected by the power cut. This one-second voltage dip that had been cited added to the confusion about Sunday’s events.

ESB made it clear that in referencing a lightning strike, Amazon was sharing its best information at the time. “Amazon accurately reported the information which had been passed to them from workers at the site,” said Marguerite Sayers, Head of Asset Management at ESB Networks. “Both the explosion and fire were localized to the bushings or insulators of the transformer and did not require the attendance of the emergency services. The extent of associated internal damage to the transformer was serious and resulted in supply interruption to a number of customers, and also impacted Amazon’s systems, as they have reported.

The article confirms the statement that you as a customer have to have a good understanding of the location and working of the provided services to get a good understanding of the risks that are at stake in terms of resiliency and business continuity. Although the question rise to which level you must have this knowledge. It looks like the outage in Dublin is not only about IT design but also about facility engineering. So how far does the responsibility of customer extend and where begins the responsibility of the provider?

Power grid crisis: Japanese data centers, saved by the bell?

Ten days ago we spent some attention to the power grid crisis in Japan and the impact on data centers. More news has now became available. In the aftermath of the big earthquake in Japan power shortage remains a big problem in Japan. Tokyo’s government is concerned that the North will not be able to maintain critical power provision during the summer. The Japanese Government issued papers recently showing forecast supply and demand for electricity in the Tokyo region. It showed that supply capacity for TEPCO was 55.20GW while demand was expected to be 60GW while at Tohokyu EPCO, 12.3GW could be produced while demand would be 14.80GW.

Starting July 1 all facilities that consume more than 500kWh are ordered to reduce their peak power usage by 15 percent from a year earlier. Some government buildings will be asked to save more, in some cases up to 21% on weekdays between 1 July and 30 September from 9am to 8pm. In most cases, businesses are required to decrease the amount of lighting used and cut down on staff hours at work, otherwise risk being fined. Fines for noncompliance are high, the equivalent of US$12,500 for each hour over the limit.

In Tokyo area, more than 70% of the country’s data centers are located.

Originally the Japan Data Center Council (JDCC) proposed grouping Japan’s data center operations into one collective for the purpose of managing the proposed rolling blackouts. It suggested that this would reduce problems that would occur if each individual data center had to run according to the blackouts each day, a measure that would increase maintenance and other costs considerably.

It is said, the plan was scrapped because a lot of data centers actually ran on diesel generators, which take longer to start up and require more uninterruptible power supply capacity, cooling systems and have more complicated maintenance requirements.

The cuts for the data centers will be stopped at 5%. JDCC calculations found that anything more than this would require allowances in time for data centers to carry out server consolidation, renew equipment and virtualize further to make way for the electricity cuts.

Data centers operators have fought the cap with some success. They argued their facilities are critical infrastructure and that many had already slashed their consumption before the earthquake as part of energy-efficiency projects. Also, companies have been “evacuating” servers and storage from Tokyo offices to their data centers because of power shortages in the capital, making it even harder to reduce demand, Yamanaka (member of the Japan Data Center Council’s steering committee) said at the DatacenterDynamics San Francisco 2011 conference

“We told the government it was physically impossible” to meet the reduction targets, he said. The government eventually backed down and has reduced the target for data centers to between zero and 15 percent, depending on how much they reduced energy use the year before.

Data centers in Japan will now join railways and hospitals, which won’t have to comply with such strict measures after the Japan Data Center Council won approval for the measures to be lifted.

They will, however, be measured on the operations data centers are responsible for. Those running banks and financial services, for example, will enjoy ongoing power use. Others may see variations on required reductions of 0% to 15%, depending on their business case.

“In order to maintain the policy of avoiding planned blackouts, we will steadily implement installation of additional power capacity as we have planned, and we continuously do our best efforts to secure supply capacity.” Yamanaka said.

Data centers are also trying to find ways to cope with long-term power shortages and the price hikes that result, and how to replenish their fuel supplies more quickly. Service providers may alter contracts so that customers shoulder some of the increased cost for raised energy prices.