Although more and more industrial plants have been incorporating reliability into their vocabulary, in several cases, something has been lost in translation. More times than not, when asked about their asset reliability program, maintenance reliability organizations do not have a process in place to document asset failures, specifically the utilization of failure coding within their computerized maintenance management system (CMMS). The goal of this article is to shed light on the long-lasting benefits of documenting failure data so that organizations not doing so become the exception rather than the rule.
In this age of the Industrial Internet of Things (IIoT), organizations can sometimes be overwhelmed by all the data produced from multiple sources and how to transform it into meaningful information. Modern diagnostic tools are excellent at detecting process and machinery problems, but this is a short-term benefit. This short-term benefit, although predictive in nature, is still somewhat a reactive approach by itself. Unfortunately, the maintenance role is often solely responsible for replacing failed components only to assure short-term plant production goals are met. What’s missing is properly investigating and documenting failures to support the reliability organization’s ability to improve long-term plant availability.
When good old-fashioned organization, classification and documentation of asset failure information, whether a simple component failure analysis or a formal root cause failure analysis (RCFA), are performed, it pays huge dividends in the ability to improve your plant’s overall reliability. Knowing how equipment fails allows for effective plans to be put in place to eliminate or mitigate future failures and improve equipment reliability. The secondary benefit of these enabling technologies is not realized until the data they generate is analyzed to determine the root cause and the subsequent corrective actions targeting the failure mode(s) are implemented.
Failure codes should include a problem, a cause and the remedy to be taken to restore normal design function of the asset.
You can leverage your CMMS investment by building an accurate asset hierarchy that is logical and user-friendly for those who need to generate work notifications, but also provides them with equally important asset class specific failure codes. These failure codes help communicate what the apparent cause of the problem appears to be from their vantage point. Failure codes should include a problem, a cause and the remedy to be taken to restore normal design function of the asset. The International Organization for Standardization (ISO) offers ISO14224, a very detailed standard that addresses failure data that you can utilize as a guideline or adopt in full.
It is recommended that each failure code has a detailed text description to explain what symptoms it is intended to cover to ensure proper use. Typically, problem codes are required entries in the CMMS for work order generation, whereas cause and remedy codes are not required fields until work order close out or completion to ensure work process compliance. Wait to finalize failure coding on a work order until failure analysis is complete in order to ensure accurate failure history has been documented.
When building your failure code hierarchy, establish an accurate list of asset types or classes for the top level, followed by the corresponding asset specific problems. Because it is unrealistic to list every possible failure problem, cause and remedy for each asset type, stick to the most common or realistic failures you may encounter in your facility and your operating environment utilizing an “Other” code to capture all others. Be sure to stipulate that details must be documented in the work order comments when using the “Other” code. As you analyze these “Other” coded failures, you will be able to perform a Pareto analysis to see if there is a common failure mode that deserves your attention and should be added to your existing library. Remember, your CMMS is a living program, as is your reliability program. Update and refresh it with current and relevant content based on the analysis of your reliability data.
Ideally, it is recommended to perform failure mode and effects analysis (FMEA) up front during the acquisition of assets to identify all failure modes. The findings from this analysis can be then used to incorporate asset failure codes for those failures you anticipate you will most likely encounter. However, a valuable continuous improvement project for an existing facility would include systematically reviewing critical assets and conducting a FMEA if one has not yet been performed for your facility’s assets.
The point is, you should create these failure codes to help proactively tackle the equipment failure issues that exist at your plant. Failure forecasting reliability analytics, whether it be Weibull, Crow-AMSAA, or even a simple Pareto analysis, are only as good as the quality and quantity of the data. The information gathered through failure data analysis can be then used to identify maintenance strategies focused on root cause elimination through early detection/inspection, new component designs, or revisions to operating procedures with the goal of improving safety and reliability. Even better is the fact that this information, when presented with supporting total cost of failure data (e.g., lost production, penalties, expedite fees, etc.), can reveal the true financial impact of asset failure and help you justify continuous improvement efforts to eliminate future failures of this type.
At first glance, this transformation process can seem a bit intimidating, to say the least. Many managers decide that this type of analysis is just too much work and abandon the whole idea entirely. But by doing so, they overlook the fact that on a day-to-day basis, maintenance is really managed at the failure mode level. However, if you build the foundation of your reliability program around disciplined data collection and analytics, it will pay huge dividends in the form of communicating priorities for safety and reliability improvements in your plant. Most importantly, these improvements add real value to the bottom line by focusing on eliminating future failures.
Measuring maintenance strategy effectiveness is one key benefit that comes from documenting failure data. Accurate failure event documentation within your CMMS provides the foundation necessary for accurate analysis of mean time between failures (MTBF), mean time to repair (MTTR) and the total cost of failure when lost production is also accounted for.
It has been observed that several failure analysis/reliability programs supported only by the maintenance department documented annual returns in the range of six to 10 times the program’s cost. However, failure analysis/reliability programs supported by the entire organization showed an annual return on investment of more than 50 times the program’s cost. This confirms that reliability is
everyone’s responsibility. To that point, it is important to educate your entire organization about how the decisions they make on a daily basis impact plant reliability and the company’s performance. Incorporating change management tools, like The Reliability Game®, to transform the culture of your organization can be very powerful. These tools will help deliver the message that proactive practices, like failure data documentation and analysis, add significant value to companies in a competitive world.
Successful reliability programs have strong plant leadership that is committed to creating a proactive work environment by understanding how vital failure data analysis is to maximizing uptime at their facilities. It can’t be overstated how important documentation and analysis of failure data is in creating improved maintenance strategies that target specific failure modes. When this becomes common practice, it is at this stage of your reliability program journey that you have truly shifted your modus operandi to a proactive approach.
- Vande Capelle, Tino; Al-Ghumgham, Mufeed; and Houtermans, Michel. “Reliability Engineering and Data Collection for the Purpose of Plant Safety and Availability.” Inside Functional Safety, Volume 1, Issue 1, 2008.
- Kovacevic, James. “How Equipment Fails, Understanding the 6 Failure Patterns.” June 2017, https://accendoreliability.com/equipment-fails-understanding-6-failure-patterns/.
- Abernethy, Robert B. The New Weibull Handbook, Fifth Edition. North Palm Beach: Dr. Robert Abernethy, April 2010.
- Sachs, Neville W. Practical Plant Failure Analysis: A Guide to Understanding Machinery Deterioration and Improving Equipment Reliability, First Edition. Boca Raton: CRC Press, 2006.
- Moubray, John. Reliability-Centered Maintenance, Second Edition. New York: Industrial Press, 1997.