Don't miss MaximoWorld 2024, the premier conference on AI for asset management!

Experience the future of asset management with cutting-edge AI at MaximoWorld 2024.

Sign Up

Please use your business email address if applicable

Reliability-Centered Maintenance and Root Cause Analysis

Reliability-Centered Maintenance and Root Cause Analysis

by Mark Galley, Cause Mapping, ThinkReliability and Douglas J. Plucknette, Author, RCM Blitz

As plants around the world strive to reduce maintenance costs and prevent incidents and accidents, they often turn to various reliability tools to speed the road to improvement. Reliability tools first help identify where losses are, then develop procedures to mitigate the losses and, thus, improve equipment reliability and performance.

One tool is Reliability-Centered Maintenance, or RCM. With safety and reliability in mind, it uses the cause-and-effect relationship to identify potential component failures and a structured decision process to select the best maintenance strategies. Those strategies should ensure equipment and processes function in accordance to inherent safety and reliability capabilities.

The story behind Reliability-Centered Maintenance begins during the 1970s in, among other places, the commercial airline industry—including two executives at United Airlines, F. Stanley Nowlan and Howard Heap. Understanding that failures (plane crashes) could cause well over 100 deaths in a single incident, industry members understood they could not simply stand back and wait for failures and then adjourn to a conference room to figure out why. They needed a proactive tool that would identify failures and develop a strategy to eliminate them, or at least reduce their probability to an acceptable level. From this, RCM emerged as an extremely effective, proactive tool to prevent failures before they happen. From the days of Nowlan and Heap, RCM has evolved into several methodologies used at companies around the world to develop maintenance plans for asset care and reliability improvement.

Uptime Elements®️ Reliability Engineering for Maintenance Domain includes Reliability-centered Maintenance [RCM], which asks what could happen, differs from root-cause analysis [RCA], which examines what did happen.

RCA and RCM: Different but Complementary

While there are several different RCM and RCA methodologies, each with their own steps, companies—and especially the people participating in these reliability initiatives—should understand a simple but important distinction between them. RCM identifies all of the different ways a piece of equipment or process could fail, while RCA identifies all of the causes answering why a piece of equipment or process did fail.

Put another way, a Reliability-Centered effort asks the question, “What could cause the problem?” A root-cause analysis asks “What did cause the problem?” These two questions help companies not only differentiate between the two methods but also understand their similarities. Both require an understanding of the function of a piece of equipment, its operating history, the most common failure modes and why they occur, and recognizing these similarities help create a more coherent and effective reliability effort.

Root-cause analysis asks three questions that each focus on the failure that has already occurred:

  1. What was the problem?
  2. What were the causes of the problem?
  3. What action should be taken to prevent the problem from occurring?

RCM, on the other hand, asks seven questions that focus on failures that could occur in the future:

  1. What are the functions of the equipment or process?
  2. How can it fail to provide the function?
  3. What causes each functional failure?
  4. What are the effects of each functional failure?
  5. How does each failure impact the goals?
  6. What action should be taken to predict and prevent each failure?
  7. What action should be taken if a proactive task cannot be determined?

Beyond asking RCM’s seven standard questions, companies should scrutinize how those questions are utilized. After some organizations work through all of these questions, they feel the process

becomes too long and involved; it takes too many people away from their work, and, most importantly, they did not see any benefit. Other organizations ask the same seven questions and feel the process moved at a good pace, captured a huge amount of knowledge and, within a short time, see a reduction in reactive maintenance and outages. Ultimately, it depends how an organization conducts RCM. Tools are available, like RCM Blitz™ (discussed below), that can help speed the process and keep an organization focused.

Considering both root cause and reliability-centered maintenance often begs the question: If RCM is so proactive, why would you ever need root-cause analysis? While being proactive is extremely important for any organization, problems still arise daily. Ideally, being proactive should make reactive issues less and less significant. Aircraft regularly have problems on flights that don’t result in a crash, just maintenance at the next stop. Organizations should strive to be effective both proactively and reactively.

RCM is not a one-time, discrete effort but part of ongoing, proactive continuous improvement. Conversely, a root-cause analysis acts reactively, identifying the causes of a particular failure that occurred at a specific date and time. It is a discrete event. Yet both reactive and proactive strategies are needed. Understanding problems reactively, knowing how equipment and processes fail, is essential for taking specific steps to prevent those failures from occurring in the first place. In other words, organizations learn from reactive responses to become more proactive.

Impact to Production

Failure-Mode Fundamentals

RCM identifies all the failure modes, the ways a failure could happen, for a given piece of equipment or process. Just as a transportation mode (that is, a “mode of transportation”) is a way to get from one place to another, a failure mode is a way something can fail. In essence, it is simply a cause that produces an effect.

Some failures can have multiple failure modes, meaning there are different ways (that is, causes) that can produce the same negative effect. Failure modes can be identified for an overall system and by breaking the system down into parts or sub-systems.

RCM requires breaking an operating system down into its parts, and then breaking these parts down further until failure modes/causes emerge, along with ways to prevent them from occurring.

In both root cause and RCM, people with first-hand knowledge of a system represent a valuable resource. They know how the system operates, what works and what doesn’t, and why the failure is a big deal. Most importantly, they can be a fountainhead of ideas for how to prevent the failure from occurring.

A front-line person that endures five system failures during the past two years can give vital insight for RCM. Consider that a root-cause analysis was conducted for each of those five previous failures. Collecting cause-andeffect data from each RCA provide information and experience used for RCM. It helps avoid duplication between the two processes and, most importantly, makes the overall reliability effort more effective.

Reliability-Centered Maintenance and Root Cause Analysis

Reliability-centered maintenance captures all of the specific tasks identified to predict, prevent or mitigate each failure mode, and formats it all on a detailed spreadsheet. Column labels contain the seven RCM Impact to Productionquestions, while rows detail systems, subsystems and components involved with the issue. In this fashion, such a table can capture the failure modes, effects and causes. How detailed this becomes depends on the issue and the associated risks. A complete RCM for a piece of equipment may only take up five pages, while a complex system might require more than 200. This data, collected in such a format, can be shared with other databases, such as those for work processes and computerized maintenance management systems (CMMS).

Reliability-centered maintenance offers tremendous value but, nevertheless, has its drawbacks. For one, it can take time and, without care, lose focus. To solve this, RCM Blitz uses the principles of RCM by keeping the group focused on achieving timely results. Software tools also make it easy to build the Cumulative Cause Map quickly and accurately, while the RCM itself is being conducted, or even while individual failures actually happen in the field.

The “Visual” Advantage

A root-cause analysis can be presented visually, organizing all causes for a particular incident. Here, a Cause Map, with a single incident to the left connected to boxes branching out to the right, can detail each cause and showing how they relate. In a Cause Map, all causes relate to each other in what are called “AND” relationships. Consider a cause-and-effect relationship where one effect had two causes (Cause 1, Cause 2); both Cause 1 and Cause 2 are required to occur for the effect to happen.

Reliability-centered maintenance, like root cause, can also use visual tools. Here, though, we consider multiple failure modes that could cause an error. Hence, these modes have an “OR” relationship, since any one of these causes could lead to a future incident (for example, the effect/incident could happen from Cause 1 or Cause 2). Here, a Cumulative Cause Map™ can capture these relationships, considering multiple, potential failure modes.

RCA and RCM Working Together

These visual tools reveal how root cause and RCM can truly work together. As stated, root-cause analysis examines one issue at time. Consider a machine that experiences three different failures over two years, and each time a root-cause analysis failure is used to analyze them. From these RCAs come failure modes describing how the machine broke. These failure modes can then be placed on one larger analysis, a Cumulative Cause Map, which attempts to visually outline all possible causes and failures for a given piece of equipment, process or system.

Such a map gets all information on one page—and a big page at that. A large RCM initiative displayed as a visual Cumulative Cause Map printed at a readable size (manually on chart paper or with the computer through Microsoft Excel or other software) could take up a three-by-five-foot space or more. Taping the map to a wall makes it easy for people to mark it, adding what they know. It makes it easier to digest how the specific, detailed tasks for equipment or processes fit into the overall reliability picture.

Building a Cumulative Cause Map requires both a thorough RCM and a complete understanding of a visual root-cause methodology. Cause Mapping can assist RCM by identifying specific systems, components and failure modes. Note, though, that the Cumulative Cause Map™ does not combine RCM and RCA into a new methodology. Either root-cause analysis or Reliability-Centered Maintenance can be used on its own as a sound, proven approach to solving problems. Nevertheless, the Cumulative Cause Map™ does show the complementary link between the two methods and helps create a simpler, more coherent approach to reliability.

When implemented, the Cumulative Cause Map visually captures organizational knowledge and experience. It can be used as a troubleshooting guide, a communication tool across shifts and sites, a teaching tool, even a way to maintain continuity when people leave a department. Want to know what employees did at a sister facility did to improve equipment reliability? Examine the Cumulative Cause Map—an organization’s visual record of information and experience.

Douglas Plucknette

Doug Plucknette is the founder of Reliability Solutions, Inc., and has worked with large industrial companies worldwide, helping them improve their reliability and operational performance. He is the author of the books, “Reliability Centered Maintenance Using RCM Blitz™” and “Clean, Green and Reliable,” and has published over 60 articles. He has been a featured speaker at numerous industry conferences.

ChatGPT with
Find Your Answers Fast