Datacenters need another perspective on security

As stated by Intel “Changing demands for bandwidth, processing power, energy efficiency and storage – brought on by such trends as cloud computing, big data, increased services and more mobile computing devices hitting the network – are driving the need for new architectures in the data center.”

Therefore we see that the datacenter world is making a transition from an artisanal mode of operation to an industrialized mode of operations. To make the industrialization of datacenters possible there is a need for uniformization, standardization, and automation to get the benefits of economy of scale.  One of the current big things in this datacenter transformation is DCIM.

Until recently there was a disconnect between the facility and IT infrastructure in the datacenter. To get rid of the limited visibility and control of the physical layer of the data center we see the rise of a new kind of system: the Data Center Infrastructure Management System (DCIM).

You could say that a DCIM system is the man in middle, a broker between the demands of the IT world and the supply of power, cooling, etc. from the Facility world. The DCIM is layered on top of the so called SCADA system. Where SCADA stands for Supervisory Control And Data Acquisition, the computerized control systems that are the heart of modern industrial automation and control systems.

So currently DCIM is a hot topic, and the added value of the different kind of flavors and implementations of DCIM systems are heavily discussed.

But something is missing. The world, moves rapidly towards the digital age, whereSCADASecurity information technology forms a crucial aspect of most organizational operations around the world. Where datacenters provide the very foundation of the IT services that are provided. Therefore datacenters can be considered as a critical infrastructure, assets that are essential for the functioning of a society and economy. But how are these assets protected? And here we are not talking about the physical security of a datacenter or how save is your business data stored and processed in a datacenter. Here we are talking about the security of the facility control systems, the cooling, the power, etc.

Beware that DCIM functionality is not only about passive monitoring and dashboards but also about active controlling and automation. The information obtained with SCADA systems will become crucial to control the infrastructure sides of facilities and even IT equipment. With DCIM the traditional standalone SCADA and Building Management Systems (BMS) get connected and integrated with the IP networks and IT systems. But also the other way around, SCADA and BMS get accessible by means of these IP networks and IT systems. This, by misusing these IP networks and IT systems, creates the risk of a (partial) denial of service or damaged data integrity of your DCIM and SCADA/BMS systems and thus the disabling of a Critical Infrastructure: The Datacenter.

 In most organizations SCADA and BMS security are not yet in scope of the activities of the Corporate Information Security Officer (CISO). But awareness is growing. Although not specifically focused on datacenters the following papers are very interesting.

 From the National Institute of Standards and Technology the Guide to Industrial Control Systems Security or the  Checklist security of ICS/SCADA systems from the National Cyber Security Centre of The Netherlands or the white paper of Trend Micro

So read it, make your own checklist and get this topic on the datacenter agenda!

Datacenters: The Need For A Monitoring Framework

For a proper usage and collaboration between BMS, DCIM, CMDB, etc. the usage of an architectural framework is recommended.

CONTEXT

A datacenter is basically a value stack. A supply chain of stack elements where each element is a service component (People, Process and Technology that adds up to an  service). For each element in the stack the IT organization has to assure the quality as agreed on. In essence these quality attributes were performance/capacity, availability/continuity, confidentiality/integrity, and compliance. And nowadays also sustainability. One of the greatest challenges for the IT organization was and is to coherently manage these quality attributes for the complete service stack or supply chain.

Currently a mixture of management systems is used to manage the datacenter service stack: BMS, DCIM, CMDB, and System & Network Management Systems.

GETTING RID OF THE SILOES

As explained in “Datacenters: blending BIM, DCIM, CMDB, etc.” we are still talking about working in silos where each of the participants that is involved in the life cycle of the Datacenter is using its own information sets and systems. To achieve real general improvements (instead of local optimizing successes) a better collaboration and information exchange between the different participants is needed.

FRAMEWORK

To steer and control the datacenter usage successfully a monitoring system should be in place to get this done. Accepting the fact that the participants are using different systems we have to find a way to improve the collaboration and information exchange between the systems. There for we need some kind of reference, an architectural framework.

For designing an efficient monitoring framework, it is important to assemble a coherent system of functional building blocks or service components. Loose coupling and strong cohesion, encapsulation and the use of Facade and Model–View–Controller (MVC) patterns is strongly wanted because of the many proprietary solutions that are involved.

BUILDING BLOCKS

Based on an earlier blog about energy monitoring a short description of the most common building blocks will be given:

  • Most vendors have their own proprietary API’s  to interface with the metering devices. Because metering differ within and between data centers these differences should be encapsulated in standard ‘Facility usage services‘. Services for the primary, secondary and tertiary power supply and usage, the cooling, the air handling.
  • For the IT infrastructure (servers, storage and network components) usage we got the same kind of issues. So the same receipt, encapsulation of proprietary API’s in standard ‘IT usage services‘, must be used.
  • Environmental conditions outside the data center, the weather, has its influences on the the data center so proper information about this must be available by a dedicated Outdoor service component.
  • For a specific data center a DC Usage Service Bus must be available to have a common interface for exchanging usage information with reporting systems.
  • The DC Data Store is a repository (Operational Data Store or Dataware House) for datacenter usage data across data centers.
  • The Configuration management database(s) (CMDB) is a repository with the system configuration information of the Facility Infrastructure and the IT infrastructure of the data centers.
  • The Manufactures specification databases stores specifications/claims of components as provided by the manufactures.
  • The IT capacity database stores the available capacity (processing power and storage) size that is available for a certain time frame.
  • The IT workload database stores the workload (processing power and storage) size that must be processed in a certain time frame.
  • The DC Policy Base is a repository with all the policies, rules, targets and thresholds about the datacenter usage.
  • The Enterprise DC Usage Service Bus must be available to have a common interface for exchanging policies, workload capacity, CMDB, manufacturer’s  and usage information of the involved data centers, with reporting systems.
  • The Composite services deliver different views and reports of the energy usage by assembling information from the different basic services by means of the Enterprise Bus.
  • The DC Usage Portal is the presentation layer for the different stakeholders that want to know something about the usage of the Datacenter.

 DC Monitoring Framework

ARCHITECTURE APPROACH

Usage of an architectural framework (reference architecture) is a must to get a monitoring environment working. The modular approach focussed on standard interfaces gives the opportunity of “rip and replace” of components. It also gives the possibility to extend the framework with other service components. The service bus provides a standard exchange of data (based on messages) between the applications and prevents the making of dedicated, proprietary point to point communication channels. Also to get this framework working a standard data model is mandatory.

Datacenters: blending BIM, DCIM, CMDB, etc.

How to manage the life cycle of a datacenter with a rapidly changing environment and where so many stakeholders are involved?

Context

A datacenter is a very special place where three different worlds and groups of people meet – there is the facility group whose focus is on the building, there is the IT infrastructure group focused on the IT equipment housed within it, and there is the IT applications group focused on the applications that runs on the IT equipment. All with different objectives and incentives.

This worked fine when changes were highly predictable and changes came relative slow. But times have changed. Business demands drive the usage of datacenters and these demands have changed; large dynamic data volumes, stringent service-level demands, ever-higher application availability requirements and changing environmental requirements must be accommodated more swiftly then ever.

Business demands and rapidly advancing information technology have led to constant replacement of IT infrastructure. This pace of replacement of IT infrastructure is not in sync with the changes of the site infrastructure. The components of power, cooling, air handling last a longer time (10 years) than IT infrastructure (two to five years). The site infrastructure often ends up being mismatched with the facility demands of the IT infrastructure. While technically feasible, changing site infrastructure in current operational data centers may not always make sense. For some data centers, the cost savings not justify the cost for renewing the site infrastructure. For other data centers, the criticality of their function to the business just prohibits downtime and inhibits facility managers from making major overhauls to realise improvements. This makes it difficult to continually optimise data centers in such a rapidly changing environment.

IT Management

One of the most significant challenges for the IT organisation was and is to coherently manage the quality attributes for the complete IT service stack or IT supply chain (including the facility / site infrastructure).

The IT department already tried to manage the IT environment with System & Network Management Systems and Configuration Management Data Bases (CMDB’s). Where the Facility department is using Building Management Systems (BMS) in monitoring and controlling the equipment in an entire building. Until recently there was a disconnect between the facility and IT infrastructure. To get rid of the limited visibility and control of the physical layer of the data center we see the rise of a new kind of system: the Data Center Infrastructure Management System (DCIM).

But there is still another gap to be bridged. The power and cooling capacity and resources of a data center are already largely set by the original MEP (Mechanical Electrical Plumbing) design and data center location choice. The Facility/MEP design sets an ‘invisible’ boundary for IT infrastructure. And just as in the IT world, in the Facility world  there is knowledge and information loss between the design, build and production/operation phase.

Knowledge Gaps

BIM

To solve this issue, the Facility world is using more and more Building Information Model systems (BIM). BIM is a model-centric repository that support the business process of planning, designing, building and maintaining a building. In other words a system to facilitate coordination, communication, analysis and simulation, project management and collaboration, asset management, maintenance and operations throughout the building life cycle.

The transition to a BIM-centric design approach fundamentally changes the Architecture, Engineer, Contractor (AEC) process and workflow by the way project information is shared, coordinated, and reviewed. But it is also extending the workflow by integrating with one of the most important players in the AEC workflow; the operators.

Dynamic information about the building, such as sensor measurements and control signals from the building systems, can be incorporated within BIM to support analysis of building operation and maintenance.

Working in Silos

Although some local improvements, in sharing information, are and can be made with BIM, DCIM, CMDB and System & Network Management Systems we are still talking about working in silos. The different participants that are involved in the life cycle of the Datacenter are using their own information sets and systems. This is a repeating process, from the owner tot the architect to the design team to the construction manager, the contractor to the subcontractors, to the different operators and, ultimately, back to the owner.

Integrated processes and life cycle management

If we want to achieve general improvements during the complete life cycle of the data center based on key performance indicators (KPI) such as Cost, Quality, On-time delivery, Productivity, Availability, and Energy efficiency a better collaboration and information exchange between the different participants is needed.

BIM, BMS, DCIM, CMDB and System & Network Management Systems do have an overlap in scope but also have their own focus: life cycle, static and dynamic status information of facility, IT infrastructure and software components.

Silo Buster

We all know that one size fits all doesn’t work and/or is not flexible enough. So what is needed is collaboration and interoperability, getting rid of the silo approach by focussing on the exchange of information between these different systems. There is a need for modular designed management systems with open API’s so that customers/users can make their own choice on which job is done by which system and still have the opportunity of an easy exchange of information (retrieval or feed).

This will revolutionizes the way Data center information is shared, coordinated, and reviewed and will affect workflows, delivery methods, and deliverables in a possitive way.

Unifying ideas and initiatives: Data Center Stack Framework & OpenDCME

The current indexes for data center performance, such as DCiE, EUE and PUE are not sufficient to drive data center efficiency. These indexes focus only on the power or energy consumption of the facilities. Each metric in itself says nothing about how efficient a data center really is. In order to drive and improve efficiency, a common framework that will describe any data center, anywhere, doing anything is required. The next step is to apply industry established metrics for each block that is running in the data center. The combination of a framework and the metrics can form the basis of real data center performance monitoring.

And here come two things together.

Data Center Pulse (DCP), a non-profit, data center industry community founded on the principles of sharing best practices among its membership is working on Standardized Data Center Stack Framework Proposal The goals of the Stack are: Treat the data center as a common system which can be easily described and measured Provide a common framework to describe, communicate, and innovate data center thinking between owner/operators peers and the industry.  So the aim is simple – provide one common framework that will describe any data center, anywhere, doing anything. The next step is to apply industry established metrics for each block that is running in the data center.

Datacenter Pulse Stack Framework

Datacenter Pulse Stack Framework

Another initiative is the open source Open Data Center Measure of Efficiency (OpenDCME). In this model 16 KPIs that span the data center are used to measure data center efficiency. As stated “This first version of the OpenDCME model is based on, amongst others, the EU Code of Conduct for Data Centres best practices in combination with the feedback of applying the model to a large number of data centers.” Mansystems , a European IT specialist in service management, consultancy & support, created and released OpenDCME. The proposed measures belongs to the community and is open for contribution by using the Creative Commons license agreement. The model consists of four domains:

  1. the IT assets that are located in the data center,
  2. the IT assets efficiency
  3. the Availability, Performance and Capacity of the IT assets,
  4. the efficiency of data center IT processes.

The radar plot shown below is the presentation of the 4 domains and the 16 KPIs (4 per topic). The OpenDCME model, in its current version, does not tell you HOW to measure the 16 KPIs.

OpenDCME model

OpenDCME model

Comparing the Stack Framework and the OpenDCME model initiatives you can see that both are complimentary to each other. Bringing these to initiatives together can accelerate the development of performance monitoring and management of data centers.

Lets see what happens …

Bookmark and Share

Energy usage, Monitoring and Green IT

Energy usage

Is there a need to do something about the rising energy needs of data centers? Yes there is, some examples:

  • Increased energy costs for organizations;
  • Increased capital costs for expansion and construction of data centers;
  • Increased strain on the existing power grid;
  • Regulations, standards and compliance;
  • Corporate reputation;

For energy-efficiency , PUE and DCiE are one of the most commonly metrics and formulae (http://thegreengrid.org/). But how to get these kind of figures for your company? A data center is a complex system so energy metering isn’t easy.

Once electricity is supplied to a data center, several and various devices consume the electrical power. A data center has from a power perspective a supply chain that consist of four large building blocks: the IT infrastructure (servers, storage and network), the primary power supply (UPS, PDU, etc.), the secondary support supply (cooling, generator, air handling) and the tertiary support supply (lighting, and everything else). Virtually all power consumed by the IT infrastructure is converted to heat. Typically about thirty to fifty percent of total power usage in a data center represents the load placed by IT infrastructure while the other percentage is for cooling, power distribution, lighting, etc.

Energy Paths

A practical example of using this kind of metrics is given by Google who made their energy usage public (http://www.google.com/corporate/green/datacenters/measuring.html).

IT infrastructure is basically a value stack. A supply chain of stack elements who act as a service component ( (People, Process and IT that adds up to an IT service). For each element in the stack the IT organization has to assure quality as agreed on. In essence these quality attributes were performance, availability, confidentiality and integrity. One of the most big challenges for the IT organization was and is to coherently manage these quality attributes for the complete service stack or supply chain. Energy as quality attribute is a new kid on the block. This attribute is composed of the Power, Cooling, and Floor Space  sub attributes. These attributes are not independent from each other. For a given data center these resources are constrained therefore, together, these  attributes form a certain threshold. If the demand for IT capacity reach this threshold further growth of the IT load is inhibit because of technical (over heating, not enough power) and or financial (excessive capital investment) reasons.

Framework

To improve the energy efficiency of existing data centers, as well as making decisions on new data center there are some metrics being used: Power Usage Effectiveness (PUE), Data Center Infrastructure Efficiency (DCIE) and Data Center Productivity (DCP). Ideally, these metrics and processes will help determine if the existing data center can be optimized before a new data center is needed. To steer and control power usage successfully a power usage monitoring system should be in place to get this done.

For designing an efficient power usage monitoring framework, it is important to assemble a coherent system of functional building blocks or service components. Loose coupling and strong cohesion, encapsulation and the use of Facade and Model–View–Controller (MVC) patterns is strongly wanted because of the many proprietary energy metering solutions.

Building blocks

  • Most vendors have their own proprietary API’s  to interface with the metering devices. Because energy metering differ within and between data centers these differences should be encapsulated in standard ‘Power services‘. Services for the primary,secondary and tertiary power supply and usage.
  • For the IT infrastructure (servers, storage and network components) power usage we got the same kind of issues. So the same receipt, encapsulation of proprietary API’s in standard ‘IT Power services‘, must be used.
  • Environmental conditions outside the data center, the weather, has its influences on the power consumption of the data center so proper information about this must be available by a dedicated Outdoor service component.
  • For a specific data center an DC Energy Usage Service Bus must be available to have a common interface for exchanging energy usage information with reporting systems.
  • The Energy Data Store is a repository (Operational Data Store or Dataware House) for energy usage data across data centers.
  • The Configuration management database(s) (CMDB) is a repository with the system configuration information of the primary, secondary and tertiary power supply and the IT infrastructure of the data centers.
  • The Manufactures specification databases stores specifications/claims of energy usage of components as provided by the manufactures.
  • The IT capacity database stores the available capacity (processing power and storage) size that is available for a certain time frame.
  • The IT workload database stores the workload (processing power and storage) size that must be processed in a certain time frame.
  • The Energy Policy Base is a repository with all the policies, rules, targets and thresholds about energy usage.
  • The Enterprise DC Energy Usage Service Bus must be available to have a common interface for exchanging policies, workload capacity, CMDB, manufacturer’s  and energy usage information of the involved data centers, with reporting systems.
  • The Composite services deliver different views and reports of the energy usage by assembling information from the different basic services by means of the Enterprise Bus.
  • The Energy Usage Portal is the presentation layer to the different stakeholders that want to know something about the IT energy usage.

Usage of an architectural framework is a must to get an energy monitoring environment working. This assembly of service components gives you the opportunity to see the average or instantaneous energy usage and compare this with the average or instantaneous IT workload and available IT capacity. Comparisons with the Energy Policies and the manufacturer specifications/claims of energy usage are also possible.

Energy Monitoring Architecture

Bookmark and Share

Monitoring, part II

Do you get what you want? Do you deliver what you promised?

Operationalize your vision, living up the promise needs a certain kind of monitoring just to be sure that everything goes fine or that appropriate actions should be made.. This monitoring is part of your quality management. There are a lot of people out side the IT world that spend time and effort  on this subject. So why don’t we borough some ideas instead of re-inventing the wheel again. A very known acronym is PDCA. PDCA (plan-do-check-act) is an iterative four-step problem-solving process typically used in business process improvement. It is also known as the Deming cycle. This is the improved version of what is also called the Wright cycle (guess-do-crash-fix).

  • Plan: Establish the objectives and processes necessary to deliver results in accordance with the expected output. By making the expected output the focus, it differs from other techniques in that the completeness and accuracy of the specification is also part of the improvement.
  • Do: Implement the new processes. Often on a small scale if possible.
  • Check: Measure the new processes and compare the results against the expected results to ascertain any differences.
  • Act: Analyze the differences to determine their cause.

The Check and Act steps are some times catch up by the notion of Monitoring. These are the steps to determine where to apply changes that will include improvement. The P-D-C-A steps can be repeated again and again in an self chosen time frame.

Need To Know

Maslow is noted for his conceptualization of  the hierarchy of human needs. His hierarchy of needs is predetermined in order of importance. It is often depicted as a pyramid consisting of different levels: the lowest level is associated with physical needs, while the uppermost level is associated with self-actualization needs, particularly those related to identity and purpose. The higher needs in this hierarchy only come into focus when the lower needs in the pyramid are met.

We can use  Maslow’s hierarchy as a analogy for monitoring in the IT world. You want and need assurance on the environment, availability and performance level. If these things are not in place you cant focus on business services and give assurance to communication and information. At the lowest levels we talk mostly about IT Operations, on the mid level about Information Management and on the highest level about Business Service Management. By using a layered model of monitoring needs you give focus to the different needs and also give a transparent and structured view how to organize monitoring.

Monitoring

Monitoring is knowing, but …

  • Simple: no complex models because people don’t like complexity
  • Visual: create a mental picture by visualizing because people don’t like the burden of tons of paper reports
  • Memorable: catch the most important properties so no more large handbooks that don’t work
  • Effective: maximize information throughput and delivery so now more interpretation discussions about figures
  • Active: put you to think, leads to actions

But beware, you can’t summarize complex systems in just a few one liners. But by improving fact finding you win time. Time you can spend on thinking about the won information and how to give direction to wanted improvements. Deming advocated that all managers need to have what he called a System of Profound Knowledge, consisting of four parts:

  • Appreciation of a system: understanding the overall processes involving suppliers, producers, and customers (or recipients) of goods and services;
  • Knowledge of variation: the range and causes of variation in quality, and use of statistical sampling in measurements;
  • Theory of knowledge: the concepts explaining knowledge and the limits of what can be known;
  • Knowledge of psychology: concepts of human nature.

Assurance Framework

The different levels of assurance can be put in an framework, an assurance stack, where each layer is the fundament for the layer on top of it. The environmental, availability and performance assurance are typical issues for IT operations. Information and communication assurance are topics for Information management. Service assurance, business service monitoring (BSM) is part of Business management. This framework can give direction and structure to the general concept of monitoring.

IT Maslow

Bookmark and Share

Monitoring

Something new on the horizon?

The separation of powers, also known as Trias Politica, is a governance model for democratic states. Under this model, the state is divided into branches, each with separate and independent powers and areas of responsibility. The normal division of these states is into an executive, a legislature and a judiciary branch. Montesquieu [1689-1755] did specify that “the independence of the judiciary has to be real, and not apparent merely”. The judiciary is generally seen as the most important of powers, independent and unchecked.

Checks and Balances

To prevent one branch from becoming supreme, and to induce the branches to cooperate, governance systems that employ a separation of powers need a way to balance each of the branches. Typically this is accomplished through a system of “checks and balances”, the origin of which, like separation of powers itself, is specifically credited to Montesquieu. Checks and balances allows for a system based regulation that allows one branch to limit another.

IT Powers

If we look how IT is currently organized, where is the separation of powers? In a lot of cases executive and judiciary branches are mingled. There is mostly not a real clear stated legislation that gives direction to the executive branch and gives the judiciary branch the means to evaluate and to give a verdict. In certain sourcing models the separation of power get even worse. Executive, legislature and judiciary branches are intermingled and partly placed by other external organizations (outsourcing). For the judiciary branch there is some times, ad hoc, the use of auditing. With auditing a starting discussion about what is the legislation that should be referred to is not uncommon.

Back on track

So why do we treat IT differently? Shouldn’t we separate the vision and strategy from the operation/execution, shouldn’t we separate the operation/execution from the monitoring and evaluation? Lets make IT more transparent with a governance model that separates powers and states checks and balances.

Separation Of Powers

Bookmark and Share