By Paul Dufresne
Recently I had a conversation with the members of our leadership group concerning steps we need to take to improve the reliability posture of our business. I was charged with, of all the initiatives that are out there, the critical few we need to focus on to improve our competitive position for our business. With that in mind, I did some data mining and looked at our recent unplanned events and came to a not so unique conclusion. BASICS! We need to focus our attention on the basics of reliability. Our issue is that we need to improve our ability to address the basic fundamentals of reliability in order to improve our competitive position. It is plain and simple that we need a disciplined approach in applying some basic methodologies that will help us to improve. One common thread missing is engagement. We need to have engagement at every level in the organization.
Pareto Bad Actors - Focus on Chronic Failures
The first step in our journey was to identify and focus on our chronic failures. In any improvement initiative, you have to know where "ground zero" is so you can develop a path forward. For us, it was simple, Pareto your downtime events and use the "80/20" rule. For those that don't know the 80/20 rule, it is based on the Pareto principle named after Italian economist Vilfredo Pareto. It means that 80 percent of your failures are usually caused by 20 percent of your equipment. The application of the Pareto principle in problem solving and analysis can provide a great starting point with simple data analysis of process, plant failure and production data. This will provide an early insight into problem causes and effects without intense or complex analysis. We need to understand what our 20 percent is in order to focus our efforts on the critical few that will have the greatest impact on the business. Do you know what your top 20 percent are?
Apply RCFA and Measure Implementation of Recommendations
Once the Pareto analysis has been completed and we have identified our 20 percent, the next step was to conduct root cause failure analysis (RCFA). RCFA is a simple, yet disciplined process used to investigate, rectify and eliminate equipment failure, and it's most effective when directed at chronic failures. Your completed analysis is only as good as the person facilitating and the cross-functional team assigned to complete the engagement. Employee involvement is crucial in an effective RCFA analysis. You must have the right people on the team to complete the investigation and analysis. In order to understand and get the true value out of the RCFA, we must realize that most failures happen in three different layers. First are the physical component, human error and finally the latent root of the problem. The latter is always the true cause of the problem. Inevitably with most failures, there will always be some form of human error, whether someone failed to perform a task correctly, or missed a step in the process. Driving to the "true" root cause can be a challenge based on the dynamics of your operating culture. Again, the entire team has to be engaged and set the expectation that all failures are avoidable, and then work to foster a culture that takes root.
As the old saying goes, "what gets measured gets done," so it is true with RCFA recommendations. If you have experienced a repeat failure because you did not implement (in a timely manner) the recommendations of the analysis, then you understand what this adage means. Once the RCFA is completed and the recommendations are identified and prioritized, you must create an action log that has the task, as well as the owner responsible for ensuring the task recommendation is completed. Do not forget to put an expected date of completion beside the task as well. Once this is completed, set up a task review meeting to track progress. Include in your metrics the number of tasks completed, along with the number of open tasks awaiting completion, and add that to your weekly review meeting. Remember, if it is true that "what gets measured gets done," then you have created an avenue to improve the situation.
Visibility on the Health of Assets (P-F Curve)
John Moubray coined the phrase "P-F interval" which we know today as the "P-F curve". The simple way to describe the P-F curve is the point at which a potential failure (P) is identified before the functional failure (F) happens. The earlier you can identify the failure, the earlier you can take action. If your organization is not in a proactive maintenance state, or fully understand the value and impact of predictive maintenance (PdM) you now have the opportunity to make a dramatic impact on your business. The earliest indication that a problem is occurring will give you the ability to be proactive in your decisions and actions based on the condition of the equipment. The power of this information is crucial in allowing your work processes, such as gatekeeping,
planning, scheduling, etc., to have the ability to work efficiently and have an effective impact on the organization. When the team understands the importance of the P-F curve and the impact it can have on an organization, this will help your team become proactive in their maintenance and reliability posture. The results can be a life less stressful, improved reliability, systems are working in harmony, and costs are lowering. Ultimately, the quality of life of all involved is improved. Is your team engaged and do they know the health of their assets? (see Figure 3)
Disciplined/Efficient Execution of Work Processes
Even in the most structured organizations, many work processes can be chaotic and disorganized. There is incomplete or outdated documentation, duplication of effort, or different people who carry out the process in a slightly different way. This is stressful for employees and costly for the organization. Although it wasn't the intention when the process first started, it evolves over time as small changes occur and work-arounds are developed. Sometimes, the work is pieced together out of necessity and no one gives any strategic thought or consideration on how this will affect the big picture.
Because of our ability to adapt our thinking to compensate for what is an inefficient process, we simply make it work. How this impedes the operation and the opportunity to make improvements will be when we go looking for data to help us make the necessary critical decision on where to focus our resources. If we follow the process and use our tools as they were designed, the vision of a more organized state becomes a reality. You then have an operation where employees have the tools they need and are empowered to execute flawlessly within the system, and cost savings are realized due to improved efficiency of the process and workflows. Ultimately, this will lead to having a profit center within your business. Is your team engaged and do they understand the importance of following the process? How efficient are your work processes? (see Figure 4)
Behavior Focused Metrics
Why do we use metrics? The truth is we use metrics with the hope of driving positive outcomes and behaviors in our organizations. Unfortunately, we can get into metric overload if we have a laundry list of metrics that we report out on. One simple question to ask is, "what are the critical few metrics that would mean the most to your organization and add the greatest value?" For example, an effective way to monitor the impact of maintenance work on equipment reliability is to keep a list of all work done on an asset showing the dates that the maintenance was done, and recording what type of maintenance was performed. For each interaction with the equipment, record the work performed, the parts used or repaired, the failure evidence collected and observed, and the known causes of the maintenance work. When you see the same parts fail for the same reasons, you can conclude that the reliability was impacted due to the quality of the maintenance work performed.
Remember, there are two types of metrics, leading and lagging. Having a right mix of metrics is critical to the success of any organization. A general rule of thumb is to have two leading metrics for each lagging metric. We hear a lot about leading and lagging metrics but what do they really mean? Lagging indicators or metrics are typically output oriented, easy to understand, but hard to improve or influence, while leading indicators or metrics are typically input oriented, hard to measure and easy to influence. Let's illustrate this with a simple example: For many, a personal goal is to lose weight. A clear lagging indicator that is easy to measure is you step on a scale and you have your answer. But how do you actually reach your goal? For weight loss, there are two "leading" indicators: 1. Calories taken in and 2. Calories burned. These two indicators are easy to influence, but very hard to measure. So having the right mix of leading and lagging metrics in place can assist you in achieving your goals. However, if you focus on only lagging metrics, it's like looking over your shoulder. You only see where you have been, not where you are going.
Conclusion, Don't Overload the Wagon!
At this point, you may be asking yourself, "this sounds great but how do I start?" It's simple, start with a vision! Pull together a cross-functional team of people within your organization that have the same desire and drive to accomplish the mission. Focus on the basics first, follow the process and realize this is a marathon, not a sprint. You will have ups and downs, and highs and lows as you go through this journey. Keep your eye on the prize and make sure you bring the team along with you, avoiding marching perilously alone on your journey toward reliability excellence. Remember, the sustaining effort will take three times the length of time you think it will. Start small, celebrate your successes and remember to have fun along the way. With leadership support, the vision of a better state and engagement by all on the team focusing fundamentals and getting them right the first time, you will be successful in your journey.