Testing your facility needn’t have to be a headache, nor should data centre operators need curl their toes at the thought of engineers ‘turning anything off’. Here Tom Blundy, senior consultant at Keysource, explains how preventative maintenance can help take the stress out of the test.
It is well documented that interruptions in power supply to data centres have the potential to cause operational and financial chaos, so it is understandable that many organisations have concerns when their engineers look to undertake work which might involve shutting down equipment.
However, properly planned and implemented preventative maintenance strategies, can minimise the likelihood of unscheduled breakdowns, outages and UPS failures.
In addition, it is important to remember that periodic inspection and testing is also a legal requirement and can, in my experience, make the difference between being fully operational and an outage.
Under the Electricity at Work Regulations 1989, it is mandatory for electrical installations to be, ‘in a safe condition and the frequency and nature of the maintenance must be such as to prevent danger so far as is reasonably practicable.’
The testing is aimed at providing confidence that the equipment is safe, with users protected against electric shocks, burns, damage by fire and heat that may arise from defects or any general deterioration that may cause safety to be impaired. Under BS7671:2018 Inspection & Testing Guidance Note 3, the frequency should not exceed five years for commercial installations.
Simply put, the legal responsibility is with the person who has to any extent control of the premises and they must decide what is reasonably practicable for their building or business and the frequency of the testing.
However, testing cannot be ruled out unless it involves grossly disproportionate sacrifices and herein lies the challenge for data centres that provide critical, high uptime services to important businesses or clients.
We already know that periodic testing and maintenance can increase rather than decrease overall uptime and revenues, it is after all vital to business continuity and employee safety. So, what can be done?
One important concession to note is that ‘sample testing’ is an acceptable methodology under the BS7671:2018 Inspection and Guidance Note 3.
This allows businesses to avoid much of the expense and disruption of testing every single board, circuit, busbar and switch in the entire installation.
As long as the testing samples are representative of the rest of the installation and an absolute minimum of 10% of the installation is tested, this can be seen as adequate.
If a site has been developed over many years in an evolving manner, then the samples should be expanded to be representative of each phase of the development.
There are inevitably some caveats though. Because every installation is different; the age, condition, maintenance regime, operating conditions and overall quality must all be considered to determine the appropriate sample size.
If the test results are poor, the sample sizes will need to be increased, and the initial samples must include more than 20% of the switchgear internals, more than 25% of the final distribution boards, more than 10% of the final circuit accessories and avoid ‘sampling samples’.
Another useful approach, although this must be a design feature of the installation, is to test separate power streams individually. This would mean that there should be no site-wide downtime during the testing, allowing power ‘supply A’ to be tested whilst ‘supply B’ continues to serve the site, and vice versa.
The most thorough but also the most disruptive option is to test as much of the installation including boards, circuits, busbars/tap offs as possible with equipment being shut down as needed for testing.
Often, this can be managed to ensure much of the testing to boards and circuits does not impact normal operations. However, testing to busbars and tap offs can be more problematic.
Often these are on the output of UPS systems within the data halls where IT equipment would be impacted, thus sequencing of the A and B power supplies would help here.
Adopting a ‘sampling’ approach would further help to limit the impact in the data halls. In facilities using busbars, selecting a representative sample to reduce the shutdown events and sequenced one bar at a time to ensure A and B supplies are not simultaneously compromised, is a viable option.
It is even possible, under certain circumstances, for specific clients to be impacted in a shared data hall, or alternatively protected, as long as the testing sample is representative and meets the guidelines.
However, it is worth noting that issues could arise where the IT equipment is concerned, as customers may not necessarily be connected to both power supplies A and B.
Unfortunately, the options to reduce downtime for these customers is limited. Where busbars are used, mitigation could be found in the use of thermal imaging and ultrasonic survey.
A thermal imaging and/or ultrasonic inspection can give an indication of overheating or vibrations associated with the deterioration of conductor insulation or loose joints.
If any issues are found, then a further investigation and inspection would be required. However, whilst this type of ‘hands off’ approach can reduce the downtime associated with testing, it is seen as an additional tool, not a BS7671 recognised substitute for periodic inspection and testing.
In summary, in the appropriate circumstances periodic inspection and testing can be completed with limited disruption to the business and in fact will increase uptime.