CRL 1-hr: 9/26 Introduction to Uptime Elements Reliability Framework and Asset Management System

Hidden Failure backup generator

Q How would you describe the concept of hidden failure?

A A hidden failure is a failure that has already occurred and under normal circumstances would go unnoticed by the operating person or crew. Under normal circumstances, such a failure would not become evident by itself to the operator or operating crew until there was another failure and the original cause identified during root cause analysis.

Q Can you cite an example of a hidden failure?

A A boiler that NORMALLY operates at a pressure below 180 psi has a safety valve that opens when pressure is above 200 psi. Unfortunately, the safety valve is stuck (failed state) and will not operate, but no one knows this since under normal circumstances the boiler operates below that limit. Since under normal conditions the safety valve is not required to operate, there would be no evidence of such failure and nobody would know that it has occurred. Therefore, we call it a "hidden" failure -- it has happened and nobody is aware of it. It will only become evident if something else happens, for example excess pressure (which is NOT normal). We then face a multiple failure, which can be catastrophic. (In a worst case scenario, the boiler blows up, causing fatalities.)

Q What impact can hidden failures have on areas like safety and the environment?

A Since protection devices are prone to hidden failures, these failures can have a very high impact, often DRAMATIC consequences on safety (somebody could get hurt or killed), or damage to the environment (an environment regulation would be violated).

Q Can a hidden failure have impact on production?

A Yes! On top of safety and environment issues, which obviously would impact production, even if no safety or environmental risks are at stake, there can be dramatic production impacts. This can be even more significant if any redundant equipment is in failed state as well. For example, suppose the standby water pump, which "protects" the service pump as a backup if it fails and only operates under those circumstances, is in a failed state as well. The failure of the redundant water pump would also be a hidden failure since on its own and under normal circumstances nobody would know it was in failed state. This only matters if and when the primary service pump should fail (multiple failure). This type of event, solely caused by multiple failure, a consequence of a hidden failure, would certainly impact production.

Q What is the difference between a Hidden Failure and a Potential Failure?

A A Potential Failure is "a clear warning" that a failure has started to occur. It allows us to PREDICT the failure since the failure mode has not yet generated a functional failure. The equipment is still fulfilling its function, but there is a warning that something has started to fail. (Condition-based maintenance = prediction)

A Hidden Failure is a failure that has already occurred and, as such, the affected equipment will not fulfill its function (the alarm will not sound, for example). Since it is hidden, nobody would know until another failure occurs, which by then may have catastrophic consequences ("multiple failure.")

Q Can you explain why this potential source of a reliability problem has gone largely unaddressed?

A While there is more and more awareness around this serious issue, as around the importance of the whole reliability approach, there is unfortunately still not enough "calling to conscience" of such very important matters. More awareness must be generated through training at all company levels. As a first step, CEOs should know and understand these problems. It is not enough for the technical staff to acquire better knowledge on these subjects if they are not understood and supported at the higher levels of the organization.

Q What can be done to raise the level of understanding about hidden failures and their impact?

A As said in my answer to the previous question, more knowledge and information transfer at all company levels is mandatory. I am convinced that until it is realized that certain basic concepts should be included into basic training starting at kindergarten, concepts learned "the hard way" as adults and/or by costly "trial and error," will too slowly evolve into visible improvement.

Q Currently, we conduct a great deal of asset inspection searching for impending problems so they can be corrected in advance of a major failure. How will a better understanding of hidden failure change our approach?

A It should be understood that the correct handling of hidden failure inspection is conceptually DIFFERENT from failure prediction by condition monitoring. A different form of inspection task will become necessary. In RCM, this is called the "failure finding task" for the hidden failures. By implementing failure finding tasks for hidden failures, using the correct methodology and correct frequency (FFI - Failure Finding Interval), a major and necessary enhancement of the reliability effort will be achieved. To get this done, awareness and a clear understanding of the problem and its importance are mandatory in the first place. Further, this level of understanding must take place at all management and floor levels within the organization.

Q Can prediction be economically built into asset design?

A Prediction MUST be built into asset design! The new design concept fed by reliability concepts will, in a not far future, seek for (and most often assure) "predictability" of ALL failures. If a failure MAY occur, then designing it in such a way that the unavoidable failure is predictable is the only way to avoid undesired downtime in the future. This is precisely the direction in which we must go if real CHANGE is sought. Of course, the effort must always be "economically sound." The concept of lifelong costing into the asset management arena is fortunately being introduced by PAS-55 (soon to be ISO-55000 standard). This standard takes this concept explicitly into consideration. However, we must always remember: Whatever we do to ensure reliability must fulfill BOTH conditions -- "technically feasible" AND "worth doing!"

Q As industry becomes more automated, will the prevalence of hidden failures increase and, if so, what can be done to address this problem?

A As more and more automation is introduced, along with much higher expectations for safety, the environment, quality, productivity, proper resource utilization, ROI, and equipment life beyond uptime and cost control, there will be more and more protective devices prone to HIDDEN FAILURES! This should NOT become a "problem" if there is full awareness and understanding of the issues involved, training for action and knowledgeably SOLVING the issues involved.

Steve Thomas

Upcoming Events

August 8 - August 10, 2023

Maximo World 2023

View all Events
80% of newsletter subscribers report finding something used to improve their jobs on a regular basis.
Subscribers get exclusive content. Just released...MRO Best Practices Special Report - a $399 value!
Uptime Elements Root Cause Analysis

Root Cause Analysis is a problem solving method. Professionals who are competent in Root Cause Analysis for problem solving are in high demand.

Reliability Risk Meter

The asset is not concerned with the management decision. The asset responds to physics

Why Reliability Leadership?

If you do not manage reliability culture, it manages you, and you may not even be aware of the extent to which this is happening!

Asset Condition Management versus Asset Health Index

Confusion abounds in language. Have you thought through the constraints of using the language of Asset Health?

Seven Chakras of Asset Management by Terrence O'Hanlon

The seven major asset management chakras run cross-functionally from the specification and design of assets through the asset lifecycle to the decommissioning and disposal of the asset connected through technology

Reliability Leader Fluid Cleanliness Pledge

Fluid Cleanliness is a Reliability Achievement Strategy as well as an asset life extension strategy

MaximoWorld 2022 Conference Austin Texas

Connect with leading maintenance professionals, reliability leaders and asset managers from the world's best-run companies who are driving digital reinvention.

“Steel-ing” Reliability in Alabama

A joint venture between two of the world’s largest steel companies inspired innovative approaches to maintenance reliability that incorporate the tools, technology and techniques of today. This article takes you on their journey.

Three Things You Need to Know About Capital Project Prioritization

“Why do you think these two projects rank so much higher in this method than the first method?” the facilitator asked the director of reliability.

What Is Industrial Maintenance as a Service?

Industrial maintenance as a service (#imaas) transfers the digital and/or manual management of maintenance and industrial operations from machine users to machine manufacturers (OEMs), while improving it considerably.