Simon Bearne, commercial director at Next Generation Data (NGD) outlines why ‘black testing’ is the only real certainty when it comes to ensuring your UPS system can handle the pressures of an outage.
A data centre that provides robust and continuously available power is amongst the most important selection criteria. However, finding one that ticks all the right boxes is often easier said than done, especially in increasingly power-strapped metro areas.
Power outages are generally caused by a loss of power in the distribution network. This could be triggered by a range of factors, from construction workers accidently cutting through cables, to power equipment failure, adverse weather conditions, or human error.
Having an N+1 redundancy infrastructure in place is therefore critical to mitigating outages due to equipment failure. Simply put, N+1 means there is more equipment deployed than needed and so allows for single component failure.
The ‘N’ stands for the number of components necessary to run your system and the ‘+1’ means there is additional capacity should a single component fail. A handful of facilities go further. NGD for example has more than double the equipment needed to supply contracted power to customers, split into two power trains on either side of the building, each of which is N+1. Both are completely separated with no common points of failure.
Physical location is also a key consideration. If possible, don’t use or locate a data centre near or on a flood plain. Furthermore, choose a site where power delivery from the utilities will not be impaired. With this in mind, know how the power actually routes between the data centre and through the electricity distribution network. This is often overlooked but is critical. In some cases, the cable routing can be somewhat messy.
But even with these precautions, a facility still isn’t necessarily 100% ‘outage proof’. All data centre equipment has an inherent possibility of failure. Studies show that a proportion of failures are caused by human mis-management of functioning equipment. This puts a huge emphasis on engineers being well trained, and critically, having the confidence and experience in knowing when to intervene and when to allow the automated systems to do their job. They must also be skilled in performing concurrent maintenance and minimising the time during which systems are running with limited resilience.
Far greater emphasis should be placed on engineers reacting quickly when a component failure occurs, rather than assuming that inbuilt resilience will solve all problems. This demands high quality training for engineering staff, predictive diagnostics, watertight support contracts and sufficient on-site spares.
When it comes to data centre critical power infrastructure, regular full-scale ‘black testing’ is the only way to be sure the systems will function correctly in the event of a real problem. Hoping for the best in the event of real-life loss of mains power simply isn’t an option.
However, not all data centres do this regularly. Some will have procedures to test their installations but rely on simulating total loss of incoming power. But this isn’t completely fool proof as the generators remain on standby and the equipment in front of the UPS systems stays on. This means that the cooling system and the lighting remain functioning during testing.
Absolute proof comes with black testing. Every six months NGD isolates incoming mains grid power and for up to sixteen seconds the UPS takes the full load while the emergency backup generators kick-in. Clearly, it’s done under strictly controlled conditions where power is only cut to one side of a 2N+2 infrastructure.
Uptime check list:
- Ensure N+1 redundancy at a minimum, but ideally 2N+x redundancy of critical systems to support separacy, testing and concurrent access.
- Streamlining MTTF will deliver significant returns on backup systems availability and reliability, and overall facilities uptime performance.
- Utilise predictive diagnostics, ensure fit for purpose support contracts, and hold appropriate spares stock on-site.
- Regularly Black Test UPS and generator backup systems rather than wait for a real-life loss of mains power.
- Continuous training and regular practice will ensure staff are clear on spotting incipient problems and responding to real time problems– what to do, and when/when not to intervene.
No data centre is immune to an outage, and a ‘wait and see’ approach to disaster recovery is never the answer if maximum uptime is to be achieved. Simulating an outage in the form of black testing will not only give you the peace of mind your UPS system is up to the job, but won’t cause any potentially disastrous disruptions to your systems either.