How Things Fail

How things fail
How things fail

"Reliability has two broad ranges of meanings:

  1. Qualitatively-operating without failure for long periods of time just as the advertisements for sale suggest, and
  2. Quantitatively-where life is predictable long and measurable in test to assure satisfactory field conditions are achieved to meet customer requirements.

Reliability is concerned with failure-free operation for periods of time, whereas quality is concerned with avoiding non-conformances at a specified time prior to shipment thus reliability measures a dynamic situation but quality measures a static situation. As in physics, statics is easier to understand and calculate than dynamics which involves higher levels of math and greater mental capabilities for comprehension."

- H. Paul Barringer

We study failed items for the same reason we do autopsies on humans: we want the data and we want it categorized correctly for making important decisions.

Failures require:

  1. A time origin which must be unambiguously defined;
  2. A scale for measuring the passage of time/starts/stops/etc., which motivates failure;
  3. The meaning of failure must be entirely clear for recording the event.

Failures during an asset's life can be attributed to the following causes:

Design Failures: This class of failures take place due to inherent design flaws in the asset or system. In a well-designed system, this class of failures should make a very small contribution to the total number of failures. Research by Winston Ledet (outlined in Don't Just Fix It, Improve It! A Journey to the Precision Domain) showed that approximately 20% of corrective work orders could be traced to poor design, build and installation issues.

Failure patterns (Courtesy of Reliabilityweb.com)

Infant Mortality: This class of failures cause new (and repaired) assets to fail. In "Reliability-Centered Maintenance" by Nowlan and Heap, up to 72% of failures are in the "worse new" or "worse repaired" (infant mortality) category.

Infant mortality random failure pattern (Courtesy of Reliabilityweb.com)

Random Failures: Random failures can occur during the entire life of an asset. These failures are also referenced in "Reliability-Centered Maintenance" by Nowlan and Heap. Up to 77-92% of failures are random in pattern.

Wear Out: Once an asset has reached the end of its useful life, degradation of component characteristics will cause assets to fail. Ledet research stated that "wear out" as a cause for a corrective work order is 12% or less. Nowlan and Heap and related research shows 8-23% of failures are wear out related.

The following graphs shows the contribution of the different failure modes towards the overall failure rate.


Contribution of different failure modes towards component failure

Where does preventive maintenance fit with these patterns?

Where does what some call "predictive" maintenance, but we call asset condition management, fit in?

Where does prescriptive maintenance fit with these patterns?

What else should we understand about failure?

Find Terrence O'Hanlon on LinkedIn.