Mission Critical

Mission critical facilities are like other facilities in that they have electro-mechanical equipment that must be maintained. The difference is that the operators of mission critical facilitiesowing to the extremely high availability requirements from managementhave to pay much more attention to the equipment so that it will not fail. This requires dual-path power supply systems (for redundancy) and regular testing of the systems.

Systems

* Dual-power technology requires two completely independent electrical systems tied together with switchgear. When the normal source of power fails, these dual-path power supply systems quickly switch to a back-up source. A UPS system keeps the power flowing until the normal source is restored or another source is brought on-line and synchronized. Usually, the UPS, through a PDU or power distribution unit (see figure 1, 2, 3), takes AC power, converts it to DC where a bank of batteries is tied in and then inverts it back to AC to feed the computer hardware. Since the systems often cannot be tested on-line, they must be tested during "maintenance windows", planned outages or times when the impact of testing is low, so that simulations can be run. By pulling power from a load bank, resistive load testing is used to fully simulate and test all equipment on the floor. Any problems that are encountered during an infrared survey are repaired immediately and the system is rechecked before putting the equipment back on-line.

Figure 1 – Typical PDU in a data center with load bank test being run.

Figure 1 - Typical PDU in a data center with load bank test being run.

Figure 2 – SCR connection on an inverter assembly at over 550º F.

Figure 2 - SCR connection on an inverter assembly at over 550º F.

Figure 3 – Bolted/crimped connector on an output filter.

Figure 3 - Bolted/crimped connector on an output filter.

* Battery back-up systems (see figure 4) must be checked in a real-time battery discharge situation to fully simulate an actual loss of the normal source of power. The batteries, connections, cables, switches and charging systems are checked for unwanted heating conditions.

* Uniform cooling of all data center server, storage, and computer equipment is essential for proper operation. The design objective of the cooling system is to provide a clear path from the source of the cooled air to the equipment and back to the cooling unit. This issue has received much attention lately as miniaturization of the equipment and economic pressures have increased the amount of heat that is generated per cubic foot of floor space and per cubic foot of rack space in the server rack panels. This hardware is sensitive to heat and humidity and some new designs are being tested so that failures do not occur solely due to environmental conditions (see figure 5). How perfect an application for IR!

* Utility main power supplies are typically owned by the local power company but are sometimes owned by the user. A looped system feeds power from two different power company substations and can be "back fed" if the power is out on the primary. No matter who the technical owner of the utility equipment is, it must be checked with IR like all other components. (See figure 6).

* Mechanical Systems have the same stringent requirements as the electrical system. Again, this is achieved by redundancy and failure prevention engineering.

Accountability

There must be a total accountability of all infrared survey results, especially all of the equipment associated with the UPS, computer and server systems. This can be accomplished by recording the entire survey on digital videotape and/or capturing fully-radiometric images of all equipment, whether problems exist or not. In either case, a data log of all equipment surveyed must be created including a time/date stamp reference for all equipment. Documentation is very important.

Figure 4 – Small battery bank with a loose lug connection on the main breaker.

Figure 4 - Small battery bank with a loose lug connection on the main breaker.

Figure 5 – Server rack designs being tested for heat dissipation.

Figure 5 - Server rack designs being tested for heat dissipation.

Figure 6 – Pad-mounted transformer with loose connection on line side.

Figure 6 - Pad-mounted transformer with loose connection on line side.

Summary

To achieve five nines availability, it is essential that competent IR testing be performed on all electrical and mechanical systems in conjunction with other testing and in cooperation with management and maintenance personnel.

If you maintain an office building, manufacturing facility or any other type of facility where uptime is important, you should take time to follow what is happening with data centers, as they are among the most mission critical of all operations.

Author Biography

Gregory R. Stockton is president of Stockton Infrared Thermographic Services, Inc. Based in Randleman, NC; the corporation operates six applications-specific divisions. Greg has been a practicing infrared thermographer since 1989. He is a Certified Infrared Thermographer with twenty-six years experience in the construction industry, specializing in maintenance and energy-related technologies. Mr. Stockton has published eleven technical papers on the subject of infrared thermography and written numerous articles about applications for infrared thermography in trade publications. He is a member of the Program Committee of SPIE (Society of Photo-Optical Instrumentation Engineers) Thermosense and Chairman of the Buildings & Infrastructures Session at the Defense and Security Symposium.

Copyright © November 2005

Stockton Infrared Thermographic Services, Inc. (www.stocktoninfrared.com) and Uptime® Magazine (http://www.uptimemagazine.com)

Upcoming Events

August 9 - August 11 2022

MaximoWorld 2022

View all Events
banner
80% of Reliabilityweb.com newsletter subscribers report finding something used to improve their jobs on a regular basis.
Subscribers get exclusive content. Just released...MRO Best Practices Special Report - a $399 value!
DOWNLOAD NOW
The Three Laws of Preventive Maintenance

The Three Laws of Preventive Maintenance

Each preventive maintenance task in an Uptime Elements developed Reliability Strategy is generated for an identifiable and explicit reason

Digital Built America: Smarter, More Sustainable and Resilient

Digital Built America: Smarter, More Sustainable and Resilient

Building back better means transitioning the current infrastructure to smarter, more sustainable forms of development to safeguard the country’s future.

DIPF Curve and RCM Failure Patterns

Predictive Maintenance Deja Vu All Over Again

Compared to total asset failures, what percentage of asset failures can be "reliably predicted" with predictive maintenance?