More specifically, without failure modes all you have is a bunch of codes. This may be the most overlooked data element as most computerized maintenance management system (CMMS) products do NOT capture the failure mode. This critical data element allows one to: (1) derive the ideal maintenance tactic, (2) quickly compare work order (WO) failure modes with RCM analysis failure modes, and (3) support drill-down on asset worst offenders for basic failure analysis. Failure data and failure analysis provide a holistic process in search of continuous improvement – using the CMMS. And, within the entire CMMS product, this may be the one feature’s greatest potential, return on investment.
What Would a Reliability Team Do in Support of Asset Reliability?
Note: A best-in-class organization would demand accurate failure data. They would routinely leverage data in the CMMS to make more informed decisions. The reliability team would include representatives from operations, health, safety, environment (HSE), maintenance, and engineering.
RCM Objectives - Simplified - The objectives of RCM analysis are to:
Preserve asset value, (2) Identify failure modes that can affect system function, (3) Prioritize the failure modes, and then (4) Select the applicable maintenance task.
Failure Modes Driven Strategy - This is a philosophy which determines maintenance tactics (stored in PM/PdM library) based on failure modes. Without this information, the maintenance and reliability staff must rely on OEM manuals and/or experienced staff for input.
Types of Failure Analysis - As to failure analysis, there are 2 techniques: root cause analysis (RCA) and basic failure analysis. The latter, basic failure analysis, is performed within the CMMS by running Pareto style reports which permit drill-down on component and problem followed by cause. The reliability team might utilize a Pareto style failure analytic each time they hold a meeting to focus on worst offenders.
Failure Code Hierarchy – At the Asset Level - This 4-level hierarchy is a common design by some CMMS products. This type of failure coding is usually at the asset level.
Examples include pump, boiler, autoclave, heat-exchanger, and fan.
This should be the first sensory observation by the O&M technician, such as the pump is noisy.
Note: Organizations differ as to how this field is to be used. Some enter the component, and, some might say, “Lack of lubrication.”
Repair, replace, clean, etc.
Failure Mode - The failure mode contains 3 distinct pieces of information: component-part; component problem; and the cause code. Each individual piece should come from a validated field. All 3 pieces are then concatenated together on the work order at job completion.
Assumption: A work order exists and it is tied to an asset (e.g. fuel pump), whereby
the fuel pump has stopped running. The technician discovered the bearing had seized. Upon further research, it was noted an error occurred due to a lack of lubrication.
Component-Part - This is the “failed” component. It is critical to the failure mode. This component is usually not known until the job is done. Examples include impeller vanes, bearing, casing, wear rings, drive belts, rotor, gauge, seal, shaft, fitting, and mounting nut. Note: It is best not to put the component as part of the failure code hierarchy due to a large variation.
Component-Problem - The component-problem could also be different from the asset-problem and it needs to be its own field.
Cause Code Which Caused the Component Failure - This information may be the hardest to capture, but it is the most important. Without knowing the cause, it is quite possible the new bearing you just installed could fail 2-3 months later because the true cause was not resolved.
Maintenance Tactics - Maintenance tactics (strategies) are selected to address the failure modes. Inside the CMMS, these are part of the PM/PdM library.
From the above, it should be noted that new fields might be added to the work order main screen, such as failed component, component problem, and cause codes.
Basic Failure Analysis Is Part of Reliability-Centered Maintenance (RCM)
In the following diagram, you will see the five main components of RCM.
Basic failure analysis leverages
historical failure data to help you identify the worst offenders in terms of recurring events. It is just one of several tools the reliability professionals use to eliminate defects.
So, What’s the Problem? Why Is Failure Data Being Overlooked?
Failure data includes many elements, such as asset condition, type failure, failure mode, downtime, and actual costs. These data elements are used within the failure analytics. But
70 percent of all CMMS installations have never successfully performed basic failure analysis using Pareto style analytics. The fact that this percentage has not changed for many decades should be an issue by itself. Here are some possible reasons:
Failure mode is not understood
Within the CMMS community, the definition and importance of failure mode is not understood. Also note that the CMMS itself may not be set up to capture this data.
Failure analysis is not understood
The user community has different definitions for “failure analysis”. To some, this could be reviewing a bunch of work orders in a conference room.
Failure analytic report does not exist
You should not assume the out-of-the-box CMMS has a decent failure analytic – if any. Therefore, the stakeholders have to define and build this report. This report should provide multiple ways to sort, select and drill-down on failure data, including MTBF, failure occurrences, downtime, age, or annual cost/replacement cost.
Reliability Team was not established
Without a reliability team, there really is
no one to give a failure analytic to. This should be a formal event, occurring every month, which begins with analysis of the worst offenders – as indicated by the failure analytic.
Free-format fields were often used to store “failure data” in lieu of validated fields
Sometimes, the shop floor (O&M staff) believes that
free-format text (describing exactly what they found and did) is adequate failure history. This may help the working level, but you cannot run failure analytics against text. Thus, 10 years of free-format text entry means 10 years of lost failure analytic capability (Pareto style).
Please note: Free-format text is still encouraged for problem descriptions, actions performed, and future recommendations – coupled with validated failure data.
Failure Mode Is Critical for Multiple Reasons
CMMS failure data includes many data elements, but the most important one is
failure mode. The first significant benefit of using failure modes is to capture it on the work order at job completion so that ready comparisons can be made to the RCM analysis failure modes library.
Assuming the failure mode is established during RCM analysis, this information can be used to determine the maintenance tactic which feeds the CMMS PM/PdM library.
This illustration shows the three elements of the failure mode and how that information can be used.
One might ask why the failure mode was never introduced to CMMS design. We may never know that answer. Although the work order is written to the asset, it is very important to identify the failed component.
Failure Code Hierarchy (FCH) design
To fully load a hierarchy for all possible failure classes with all possible components could require substantial time. And because of this complexity, the staff frequently decides not to build the complete hierarchy, let alone populate the work order.
Due to the importance of capturing accurate cause codes, it helps the user to be able to “drill into” the true cause. And this design requires a cause code hierarchy.
Setting Up Cause Codes
The cause codes are very important as they are the third piece of the failure mode and, without the true cause, the failure will most likely happen again. The maintenance technician would fill-in Cause-1. The maintenance supervisor would fill in Cause-2. And the reliability engineer would fill in Cause-3. If the third cause code is filled in, then this completes the failure mode. One should also note that, unless Cause-4 is determined, this same failure might happen next month.
Unraveling the Puzzle
Validated fields are needed for capturing failure data. This approach prevents erroneous information. Failure data fields should be mandatory if a repair activity was performed (involving functional failure). Remember that every day that passes where work orders are completed (or closed-out), and missing failure data, it will be near impossible to go back and recover this information.
So that said, how do you move forward? Here is a set of instructions:
- Configure the work order screen to capture the failure mode using three fields: component, component-problem, and cause codes.
- Establish a new failure hierarchy where level-1 = “ASSET.” Apply this value to all assets.
- Build-out a generic list of problem codes which works for all types of assets.
- Utilize work order classification field as “Failed Component.” Set up a three-level classification, e.g. PUMP\MECH\IMPELLER.
- Add a new field titled, “Suggested Add for Missing Component.” When WO classification is missing component, then user fills this field in (which gets routed to planner for review).
- Add new field titled “Component-Problem.” This uses same choice list as asset-problem.
- Add four new cause fields.
- Design/build the failure analytic – called asset offender report. This report would dynamically build the failure mode by concatenating the component-part, component-problem, and cause together to make the failure mode.
- Create a new application to store failure modes to which the WO completion data can be compared against.
Setting Up Generic Problem Codes
The assumption here is that all assets basically have the same set of problem codes. You should be able to create a standard set of problems with under 20 choices.
Failure Data Timeline
It is important to note that failure data is captured at different points in the WO chronology – and by different people. The operations (or maintenance) staff creating the initial request would specify the WO number, problem description, asset number, asset-problem, type-failure (full, partial, potential, defect), and worktype (=Corrective Maintenance). At job completion, then the technician starts the failure mode capture, but the maintenance supervisor and reliability engineer may also be involved.
There are multiple ways to have a successful reliability program. However, the CMMS is a key element to that success. The stakeholders (e.g. core team) have a
responsibility for identifying/creating that vision - and road map - to get there. It’s always best to create a long-range plan beginning with a clear endgame.
You might request a new project called reliability improvement. And as part of that effort, early setup of failure data fields, coupled with staff training, is encouraged. If it takes more time to get the failure analytic designed and a reliability team in place, so be it. At least the correct failure data will be there when you are ready. The final result is a maintenance reliability program that preserves asset performance, promotes work force productivity (and job safety), and optimizes O&M cost.