Reliabilityweb Hidden Failure Q&A

Hidden Failure Q&A - Henry Ellmann

Q How would you describe the concept of hidden failure?

A A hidden failure is a failure that has already occurred and under normal circumstances would go unnoticed by the operating person or crew. Under normal circumstances, such a failure would not become evident by itself to the operator or operating crew until there was another failure and the original cause identified during root cause analysis.

Q Can you cite an example of a hidden failure?

A A boiler that NORMALLY operates at a pressure below 180 psi has a safety valve that opens when pressure is above 200 psi. Unfortunately, the safety valve is stuck (failed state) and will not operate, but no one knows this since under normal circumstances the boiler operates below that limit. Since under normal conditions the safety valve is not required to operate, there would be no evidence of such failure and nobody would know that it has occurred. Therefore, we call it a "hidden" failure -- it has happened and nobody is aware of it. It will only become evident if something else happens, for example excess pressure (which is NOT normal). We then face a multiple failure, which can be catastrophic. (In a worst case scenario, the boiler blows up, causing fatalities.)

Q What impact can hidden failures have on areas like safety and the environment?

A Since protection devices are prone to hidden failures, these failures can have a very high impact, often DRAMATIC consequences on safety (somebody could get hurt or killed), or damage to the environment (an environment regulation would be violated).

Q Can a hidden failure have impact on production?

A Yes! On top of safety and environment issues, which obviously would impact production, even if no safety or environmental risks are at stake, there can be dramatic production impacts. This can be even more significant if any redundant equipment is in failed state as well. For example, suppose the standby water pump, which "protects" the service pump as a backup if it fails and only operates under those circumstances, is in a failed state as well. The failure of the redundant water pump would also be a hidden failure since on its own and under normal circumstances nobody would know it was in failed state. This only matters if and when the primary service pump should fail (multiple failure). This type of event, solely caused by multiple failure, a consequence of a hidden failure, would certainly impact production.

Q What is the difference between a Hidden Failure and a Potential Failure?

A A Potential Failure is "a clear warning" that a failure has started to occur. It allows us to PREDICT the failure since the failure mode has not yet generated a functional failure. The equipment is still fulfilling its function, but there is a warning that something has started to fail. (Condition-based maintenance = prediction)

A Hidden Failure is a failure that has already occurred and, as such, the affected equipment will not fulfill its function (the alarm will not sound, for example). Since it is hidden, nobody would know until another failure occurs, which by then may have catastrophic consequences ("multiple failure.")

Q Can you explain why this potential source of a reliability problem has gone largely unaddressed?

A While there is more and more awareness around this serious issue, as around the importance of the whole reliability approach, there is unfortunately still not enough "calling to conscience" of such very important matters. More awareness must be generated through training at all company levels. As a first step, CEOs should know and understand these problems. It is not enough for the technical staff to acquire better knowledge on these subjects if they are not understood and supported at the higher levels of the organization.

Q What can be done to raise the level of understanding about hidden failures and their impact?

A As said in my answer to the previous question, more knowledge and information transfer at all company levels is mandatory. I am convinced that until it is realized that certain basic concepts should be included into basic training starting at kindergarten, concepts learned "the hard way" as adults and/or by costly "trial and error," will too slowly evolve into visible improvement.

Q Currently, we conduct a great deal of asset inspection searching for impending problems so they can be corrected in advance of a major failure. How will a better understanding of hidden failure change our approach?

A It should be understood that the correct handling of hidden failure inspection is conceptually DIFFERENT from failure prediction by condition monitoring. A different form of inspection task will become necessary. In RCM, this is called the "failure finding task" for the hidden failures. By implementing failure finding tasks for hidden failures, using the correct methodology and correct frequency (FFI - Failure Finding Interval), a major and necessary enhancement of the reliability effort will be achieved. To get this done, awareness and a clear understanding of the problem and its importance are mandatory in the first place. Further, this level of understanding must take place at all management and floor levels within the organization.

Q Can prediction be economically built into asset design?

A Prediction MUST be built into asset design! The new design concept fed by reliability concepts will, in a not far future, seek for (and most often assure) "predictability" of ALL failures. If a failure MAY occur, then designing it in such a way that the unavoidable failure is predictable is the only way to avoid undesired downtime in the future. This is precisely the direction in which we must go if real CHANGE is sought. Of course, the effort must always be "economically sound." The concept of lifelong costing into the asset management arena is fortunately being introduced by PAS-55 (soon to be ISO-55000 standard). This standard takes this concept explicitly into consideration. However, we must always remember: Whatever we do to ensure reliability must fulfill BOTH conditions -- "technically feasible" AND "worth doing!"

Q As industry becomes more automated, will the prevalence of hidden failures increase and, if so, what can be done to address this problem?

A As more and more automation is introduced, along with much higher expectations for safety, the environment, quality, productivity, proper resource utilization, ROI, and equipment life beyond uptime and cost control, there will be more and more protective devices prone to HIDDEN FAILURES! This should NOT become a "problem" if there is full awareness and understanding of the issues involved, training for action and knowledgeably SOLVING the issues involved.

From Your Site Articles