Reliabilityweb Estimating Failure Avoidance Costs

Estimating Failure Avoidance Costs

The FAC can range from correcting a loss of system efficiency - typical of air and steam systems - to partial or complete functional failures. Some failures have the potential to incur huge costs related to capital, labor, services, product materials and equipment. The FAC method attempts to estimate the most likely cost of a failure avoided using a standardized method for partial or complete failures.

The two basic elements of the FAC method can be easily explained through the P-F curve and a simple Risk Analysis matrix.

Savings are realized by avoiding failures that carry a high cost to repair and have an adverse effect on production, safety, or the environment. As shown in Figure 1, if condition monitoring identifies a problem at "early signal" points one through three, then the correction will have a minimal cost to repair and will have little or no effect on production. The failure avoidance costs are estimated by comparing the PdM costs to the higher failure costs when equipment is allowed to run to failure (RTF).

With differences in the FAC method, estimating this savings accurately lies in the use of a Risk Matrix to approximate the total risk for each failure avoidance case based on the conditions related to the failure and the criticality of thequipment. PdM early fault indicators can run the gamut from symptomatic (loose belt, looseness, unbalance) to problematic (defective bearing or gear) and the consequence can run from no effect to catastrophe, as shown in Figure 2.

By estimating the potential risk and assuming that there is a point P that occurs before point F on the P-F curve in Figure 1, failure avoidance costs can be estimated.

Probability and consequence define risk and are used to accurately determine the potential of the individual failure scenarios without having to resort to an all-or-nothing estimation method.

Consequence - Equipment/area specific information related to the range of severity of historical failures.
Probability - Potential for the event to occur based on the current conditions associated with the PdM monitoring results.

Information Collection

After an item is repaired as a result of a PdM find, information is collected related to the current installation and the identified failure mode. The following questions should be addressed: "If left to its own devices, what would normally happen if PdM did not find this problem?" in terms of consequences and "What are the most likely range of scenarios for the failure mode identified by condition monitoring?" in terms of probability. Failure avoidance savings are calculated by subtracting the PdM repair costs from the total of the three "most likely" minor, moderate and severe case scenarios.

To estimate the risk of allowing the equipment to run to failure, the following information is collected - which then can be used to build a reference table for future cases in the area (See Tables 1 and 2).

"Just the FAC's ma'am"

The answers to these questions are normally available from common maintenance knowledge, PdM information forms, a Criticality Assessment database, the CMMS system, or by interviewing a production partner. The information can be stored in a common lookup table for future reference. Tabulating standard production downtime costs by area, as well as the maintenance cost for repairing common components like motors, compressors, pumps, or air handlers, will save time in generating future failure avoidance cases.

A Sample Failure Avoidance Case

Air Handler Example:

This is an example of an air handler defect shown in Figure 3 that was detected through normal periodic vibration monitoring. This unit was misaligned during an earlier maintenance repair. The misalignment caused looseness in the air handler bearing and the shaft was beginning to be cut as a result of the condition.

These cases were based on the information collected and the inspection made to the air handling unit. The unit was destined to fail in one form or another. Because of the location of the unit and the type of failure, the problem would have been detected when the damage to the shaft had become well developed. There was a slim chance that the bearing defect would have been detected before the failure progressed to shaft damage. There is a low probability that the failure would have been catastrophic once the shaft was cut further.

The vibration trend shows that this equipment was getting worse at an accelerated pace and had gone undetected by operations up to this point. The information related to probability and consequences are summarized below.

The three scenarios were researched and summarized in Table 3, on the right, compared to the planned and scheduled PdM originated repair. The only remaining step is to assign the risk factor for this failure mode.

Risk Determination

The matrix in Figure 4 was constructed from historic failure data that showed increasing risk carries a higher cost. Low probability and low consequence events almost always result in a minor cost, while high consequence and high probability events almost always result in the highest costs.

The information from the sample case indicates a relatively "low" level of consequence because:

the evolution of the failure is slow;
the equipment is in a relative accessible area;
the failure will likely be detected through noise generation prior to total failure.

Because the probability of this occurrence is inevitable once the failure mode is initiated and the damage to the bearing and shaft is irreversible, the probability of failure is determined to be "high."

Based on the low consequence and high probability, the total risk is estimated in the lower right hand corner of the risk matrix shown in Figure 4 and has this distribution of risk:

In 30% of all historical failures with a similar risk profile, a minor failure occurs (emergency bearing replacement).
In 60% of failures, moderate failure occurs (emergency bearing and shaft replacement).
In 10% of failures, severe failure occurs (emergency bearing and shaft replacement and rotor/duct rework is required).

The data collected from the previous table and the risk percentages are now entered into the spreadsheet in Table 4. Actual repair costs, as well as previous average repair costs, are researched and entered to provide the most accurate estimate of the MTTR for each failure case.

This information is used to calculate production losses based on average downtime duration. While this process appears cumbersome, most of the data is available from the CMMS, the criticality database, or historical tables from previous FAC sheets and takes about two to three hours to complete.

This $235,356 savings case (the average documented savings amount is $200,000) becomes a single record in the total PdM savings database and the following data field are associated with the avoided:

Technology used
Date of the failure repair
Classification of equipment (fan, pump, etc.)
Plant area affected
Root cause of the failure mode
Recommended corrective action.

The goal of Grifols, Inc.'s PdM department is not to collect data, analyze data, or even diagnose problems. The mission is to make appropriate permanent repairs to extend equipment life in order to meet production requirements. To meet this goal, it is important to focus on what actions should be taken to avoid the types of failures that we encounter on a daily basis.

By combining the results of all technologies and all individual failures, distribution charts can be generated that are sorted by date, equipment type, location, failure mode, or root cause. Because this information is monetized instead of just a count, the results of all PdM technologies are judged by how they impact the bottom line. Figure 5 summarizes the savings realized by years and Figure 6 is a distribution of the failure avoided by the corrective actions needed to avoid future occurrences.

We can also estimate ROI for each technology based on the actual failures and inefficiencies corrected, as shown in Figure 7.

The PdM department monitors over 600 machine trains and 15% to 20% of the total number of machines monitored have had a case completed against them. The number of success repairs averages two to three cases per month and the estimated avoided failure savings have totaled over $25 million dollars. Since the baseline readings taken in 2008, the total number of machines on the "watch list" has dropped from 15% to below 4%.

The legitimacy of the FAC methodology is that the savings are estimated only for failures that were avoided through a condition monitoring assessment and returned to service without requiring further action. The worst case scenario is not always considered the most likely outcome of a failure and real plant historical data is used to estimate how often each type of scenario will occur.

Michael Cook, MSMRE from UT/ Monash, is currently the Maintenance Reliability Manager at the Grifols Inc. (formerly Talecris) Clayton, NC plant. Mike has 22 years of leadership experience in the U.S., Mexico and Canada implementing Predictive Maintenance and Reliability Engineering programs in a variety of industries, previously with Duke/Progress Energy. The plant Reliability Group won the Emerging PdM and Ultrasound Programs of the year in 2010 through Uptime Magazine. www.grifols.com

Michael Muiter is a Senior Reliability Engineer for Grifols Inc., Clayton NC. Mike has 25 years hands-on experience in Maintenance and Reliability Engineering in Steel processing, Motorcycle manufacturing (Harley-Davidson), Automotive (General Motors), Textile and bio-pharmaceutical industries. Mike is a Certified Maintenance and Reliability Professional (CMRP) and a certified Level I Vibration Analyst and Level II IR Thermographer. Mike managed the award winning Ultrasound Program of the year in 2010 through Uptime Magazine. www.grifols.com

From Your Site Articles