Uptime Magazine caught up with Susan Lubell and Ricky Smith, authors of the recently published book, “Root Cause Analysis Made Simple.” They shared some of the most frequently asked questions regarding root cause analysis (RCA) that are asked by not only maintenance practitioners, but also their production operations teams and management, who they need to support them in their work.
Q: Why investigate failures? In other words, why bother with RCA?
The goal for any RCA is to prevent the recurrence of failures and/or to minimize the consequences (effects) of a failure. Unexpected equipment failures are not normal and should not be tolerated.
There are three main reasons or consequence categories to guide our investigation of failures: health and safety consequences; environmental consequences; and financial and production consequences. Some companies choose to add a fourth category focused on reputational risk. All failures that pose a risk to safety or the environment must be investigated. Addressing chronic or repeat failures that have a financial or production consequence can result in bottom-line savings for all types of facilities, such as manufacturing, oil and gas, mining, hospitals, food production, etc.
Q: Will RCA really make a difference?
Maintenance technicians and professionals are frequently too busy fixing repetitive problems and not spending enough time preventing failures. We need to break out of this reactive approach to maintenance to improve not only the safety and reliability of our facilities, but also to reduce our production costs.
To put things into perspective with a simplified business case, if a company produces 100,000 items per day (e.g., liters of soda, barrels of oil, tonnes of coal or fertilizer, boxes of cereal, etc.) with a profit margin of $1/item, then a 0.1 percent improvement in the volume produced per year translates into an additional $365,000 per year of profit. From an operating perspective in a continuous processing facility, 0.1 percent equals approximately nine hours of additional production time per year, based on 8,760 hours in a year.
Is it possible to achieve an extra nine hours of production per year from your facility? What benefits would you see from refocusing your maintenance staff on preventing failures, completing predictive maintenance routines and analyzing asset health instead of constantly executing repairs on a rush basis.
Q: How do I perform an RCA?
There are four fundamental steps to performing an RCA:
- Quantify the magnitude of the problem;
- Perform the analysis using the appropriate technique;
- Develop a list of options for solving the problem;
- Document the results and implement recommended actions.
The payback from performing an RCA comes from implementing the recommended actions and ensuring they have stopped the cycle of a repeat failure.
Q: Who needs to be involved?
Once the decision has been made to conduct an RCA, the next question is typically who should participate. The RCA typically brings five to six knowledgeable people together as a core team to investigate the failure using evidence left behind from the failure event. The composition of this team should stress complementary skills with an understood common purpose of identifying the root cause of the failure.
Q: What is the most important step in performing an RCA?
Preserving the broken parts and evidence!! Information, including operating parameters, broken parts and components, and documenting what people heard, smelled, saw and felt, is absolutely critical to conducting an RCA. In the haste to get the facility back up and running, valuable evidence is quickly lost. Take pictures and preserve the broken parts as you’re dismantling and repairing the equipment. Even if you later decide that an equipment failure isn’t worth investigating, you’ll have the luxury of making this decision in the future and not regret a hasty decision made while dealing with the failure’s aftermath.
Here’s a diagram that demonstrates when you achieve return on investment (ROI) on your RCA efforts. Many people think that it’s when you investigate, but it’s actually when you prevent recurrence that your efforts pay off.