by Jim Fitch
Machine condition monitoring requires a proper foundation from understanding and aligning criticality and failure mode analysis. Sadly, for most plants, condition monitoring consists of multiple technologies that are cobbled together in an attempt to enhance machine reliability.
Alignment greatly helps to optimize deployment of activities and spending to minimize waste and redundancy. Alignment also keeps maintenance reliability professionals on the same page by providing a clear understanding of what's being done and why.
It is intuitively obvious that smart maintenance decisions require a heightened sense of both the probability and consequences of machine failure. For instance, there are real consequences when lubricants fail that are, at least initially, independent of machine failure. These include lubricant replacement costs (e.g., material, labor, flushing, etc.) and associated downtime. These costs can exist in the presence of a perfectly healthy and operating machine. Of course, lack of timely replacement of a defective lubricant will invariably lead to dire machine failure consequences. For some machines, these cascading events can produce enormous collateral damage and financial hardship to an organization.
The method presented in this article is believed to be the first, truly rationalized and unified approach to condition monitoring based on both machine and lubricant failure mode ranking and criticality analysis. The condition monitoring methods and technologies being integrated include oil analysis (real time, portable and laboratory), field inspections (advanced methods providing frequent and comprehensive assessments) and other portable and real-time condition monitoring technologies (e.g., thermography, vibration, etc.).
This approach is important enough that it deserves a name: unified condition monitoring (UCM). What makes UCM different from other strategies is:
- Periodic condition monitoring technologies and methods for each machine are integrated and optimized.
- Periodicity for each technology and method is optimized.
- The method of optimization is based on criticality analysis and failure mode ranking.
The optimum reference state (ORS) concept is a central theme in condition monitoring and defines the specific machine and lubricant conditions sought to monitor and control. The ORS is a state of preparedness and condition readiness that enables lubrication excellence and machine reliability. It gives the machine and its work environment reliability DNA as it relates to lubrication. Of course, the ORS can be easily applied to other reliability objectives, too. For lubrication, the enabling attributes of the ORS are:
- People Preparedness. People are trained to modern lubrication skill standards and have certified competencies.
- Machine Preparedness. Machines have the necessary design and accouterments for quality inspection, lubrication, contamination control, oil sampling, etc.
- Precision Lubricants. Lubricants are correctly selected across key physical, chemical and performance properties, including base oil, viscosity, additives, film strength, oxidation stability, etc.
- Precision Lubrication. Lubrication procedures, frequencies, amounts, locations, etc., are precisely designed to achieve reliability objectives.
- Oil Analysis. This includes optimal selection of the oil analysis lab, test slate, sampling frequency, alarm limits, troubleshooting rationale, etc.
These ORS attributes are simple, fundamental changes that are within a plant's ability to modify and manage. They are definable, measurable, verifiable and controllable.
Failure Mode Ranking
Ranking failure modes helps customize and optimize the condition monitoring strategy. This is another way to say gaining the greatest benefit for the least possible cost and risk. According to the Pareto principle, the top 20 percent of failure causes are responsible for roughly 80 percent of the failure occurrences. It only makes sense, then, to focus resources and condition monitoring on the top 20 percent.
Failure modes and failure root causes are closely associated and often the same. For instance, abrasive wear may be the failure mode, but particle contamination is the root cause. Ignorance, culture, insufficient maintenance and poor machine design are all possible preexisting conditions that individually or collectively lead to contamination. Because you can always search for deeper levels of cause, for simplicity, the terms failure mode and root cause are used interchangeably.
Figure 1 shows the relationship between machine and lubricant failure. On the left are common causes (failure modes) of lubricant failure and machine failure. For example, heat, aeration and contaminants are known to be highly destructive to lubricants. In a similar sense, overloading, misalignment and contamination can abruptly cause a machine to fail. Note how contamination not only can fail a lubricant, but also a machine directly without the need to harm the lubricant first.
It is best to not only list failure causes, but also to rank them in terms of probability and severity. This helps allocate resources by priority. From lubricant and machine failures come specific consequences, which are listed on the right in Figure 1. Again, these consequences are mutually exclusive. Lubricant failure consequences include oil replacement costs, downtime during the oil change, labor to change the oil and flushing costs. Machine failure consequences relate to safety, spare parts, labor to repair and downtime (e.g., production losses).
The overall lubricant criticality (OLC) defines the importance of lubricant health and longevity as influenced by the probability of premature lubricant failure and the likely consequences for both the lubricant and the machine. The overall machine criticality (OMC) defines the likelihood and consequences of machine failure alone. Like many methods, the approach for calculating OLC and OMC is not an exact science, but, nevertheless, is grounded in solid principles in applied tribology and machine reliability.
Building the Surveillance Planning Table
Figure 2 shows an example of a surveillance planning table (SPT) for a given machine, in this case a reciprocating compressor. The SPT is used to define the degree of surveillance, for instance an oil analysis and inspection, for each of the ranked failure modes. These failure modes are ranked from one to seven on the left of the SPT. Tribology analysts and reliability professionals are best suited to assign this ranking for individual machines. The list shown in Figure 2 is hypothetical for the compressor example to illustrate how to build an SPT.
Across the top is the OMC range from 10 to 100 (see Part 1 for calculating the OMC score). A score of 100 represents high criticality from the standpoint of probability of failure and consequences of failure. In this example, the arrow shows the compressor to have an OMC score of 80. There are seven color-coded condition monitoring zones corresponding to time-based surveillance levels that range from CM1 (real time) to CM4 (monthly) to CM7 (never). For an OMC of 80, the condition monitoring zones range from CM1 to CM4.
The only things that change from machine to machine using the SPT are the failure mode rankings and the placement of the arrow corresponding to the OMC score. Otherwise, all SPTs look exactly the same. For instance, the compressor has particle contamination assigned to the highest ranked failure mode. With an OMC of 80, the intersecting box shows a CM1 condition monitoring zone. This relates to real-time surveillance. You can see in Figure 2 that real time refers to the use of real-time sensors (A) and monthly oil analysis (D) from the test and inspection categories list. There are numerous online particle counters on the market that could be conveniently used for CM1 surveillance. On the other hand, water contamination merits a CM2 surveillance level. This can be done using daily inspections and monthly oil analysis.
Figure 3 presents a similar SPT, but specifically for the lubricant. The lubricant failure mode ranking is on the left and the overall lubricant criticality is across the top. In this case, the OLC score is 70, which has condition monitoring zones ranging from CM2 to CM4.
Combining Machine and Lubricant SPTs
Figure 4 shows the SPTs for both the machine and lubricant in a single unified table. The failures for both the machine failure mode (MFM) and lubricant failure mode (LFM) are listed across the top, with the corresponding condition monitoring surveillance zones just below. Down the left are various oil analysis tests and inspections that satisfy the condition monitoring requirements for each failure mode. This list was developed based on the available and required technologies and methods. The legend lists specific surveillance types (e.g., lab testing or inspection) and periodicity (e.g., frequency of use).
By referring to the condition monitoring zones under each failure mode, the surveillance type(s) and periodicity can be properly selected and optimized. For instance, under particle contamination is the R designation for real time and L4 for monthly laboratory analysis. Under aeration and foam is the F3 designation for weekly field inspections of the compressor's sight glass. Misalignment is monitored using multiple methods, including elemental analysis of wear metals (monthly laboratory analysis), ferrous density analysis (monthly), wear particle identification (on exception based on elemental analysis and ferrous density), magnetic plug inspections (weekly) and vibration analysis (weekly). These tests and inspections can be easily rationalized and streamlined to improve efficiency and reduce costs.All tests and inspections can be condensed into a single condition monitoring work plan for the compressor, as seen in Figure 5. The tests and methods needed are clearly shown, as well as the frequency for the four main monitoring categories: real-time sensors, field inspections/tests, on-site lab testing and full-service lab testing. This work plan is the final product of the UCM strategy.
Using the Unified Condition Monitoring Model
From the preceding information, you can see how nearly all decisions related to periodic condition monitoring depend on four factors: overall machine criticality, overall lubricant criticality, machine failure modes and lubricant failure modes. These factors influence what to test, when to test and how to test. In relation to oil analysis, these factors affect where to sample, how often to sample, which tests to conduct, which alarms to set and the general data interpretation strategy.
UCM is an overarching principle that can be adapted for many applications and uses in the reliability field. The more you know about machine-specific failure modes and criticality, the better you can plan and optimize condition maintenance across multiple technologies within both predictive and proactive schemes. On the surface, these foundation pieces can seem time-consuming and arduous, but in the long run, you gain by reducing costs and optimizing the benefits. These are solid and wise reliability investments indeed.