An Important Aspect in Root Cause Analysis
by David Gluzman
The root cause analysis (RCA) method utilizes cause and effect linear and branching approaches by asking multiple “why” questions as an effective way to identify one or more low level conditions leading to a failure. This allows development of a set of corrective actions that will prevent failure in the future. The cause(s) at the lowest level is frequently termed the root cause(s). The purpose of this article is not to present a course on the well-known method, but rather to offer a view on certain details and suggest modifications that are crucial for clarification of the method.
Cause and effect mapping employs cause/effect block chains and formal logic operators between the blocks, thus allowing branching for easy visualization of the analysis process discussed in detail in the narrative. Most proponents of the method do not recommend differentiating the causes by using terms like “major cause,” “minor cause,” “immediate cause,” “intermediate cause,” etc. Instead of differentiating causes by importance or order, they suggest differentiating corrective actions on the basis of feasibility, effectiveness, etc. With this approach, after the analysis is complete and the cause and effect map is built, it is up to the stakeholders to decide which of the root causes deserve more attention. While this approach makes a lot of sense, some proponents also do not recommend using the term “contributing factor” as a type, which is the reason it is being addressed here.
Terms and definitions are frequently being used rather loosely and, being borrowed from day-to-day interpretation, ultimately lead to fuzzy discussions. To avoid confusion in the RCA process, the definitions used in the narrative and cause mapping will be introduced here first. I suggest using two terms: cause and contributing factor. The reason for using two terms rather than one will become clearer later in this article.
The dictionary defines the term “cause” as a producer of an effect. It means if there is an effect, there must be a cause(s) producing it. Therefore, for the purpose of RCA, the term will be defined here as:
CAUSE is a condition that produces an effect; eliminating a cause(s) will eliminate the effect.
The dictionary defines the term “contribute” as giving with others for a common purpose; helping to bring about a result; exacerbating something; acting as a factor. For instance, when people give to a heart disease research fund, it is said they contribute to this fund or purpose. However, if someone refuses to give, the fund’s assets may be smaller, but it will still exist. Therefore, for the purpose of RCA, the term is defined here as:
CONTRIBUTING FACTOR is a condition that influences the effect by increasing its likelihood, accelerating the effect in time, affecting severity of the consequences, etc.; eliminating a contributing factor(s) won’t eliminate the effect.
This concept will be demonstrated by analyzing a hypothetical traffic accident. Picture a failure event - a collision between a train and a car at a railroad intersection. The investigation revealed that the rail crossing had no signal lights/automatic gates; the car was equipped with a manual shift gear; the driver was switching gears while crossing the rails, thus wasting valuable time; and since the collision occurred during the twilight hours, visibility was not perfect. It is clear that the following conditions definitely affected this failure:
- intersecting traffic,
- no signal lights/automatic gates,
- inexperienced driver.
As a general comment it is worthwhile to mention that during any investigation, the participants frequently name conditions that have had influence on the failure event as contributors. It likely happens because people are invoking casual day-to-day terminology instead of applying terminology established for the method. This action may cause confusion because in light of the cause and contributing factor definitions given previously, it may not be accurate. It will be shown below that the aforementioned conditions are, in fact, the causes.
Note that all three conditions have to be true in order to produce a collision effect. On the other hand, if, for instance, the signal lights/automatic gates were present, the car wouldn’t be able to cross the rails and no collision would have occurred. The same is true for the two other conditions. Therefore, on the cause and effect map, these conditions have to be connected with a logical gate “AND.” As far as corrective actions are concerned, by correcting one, two, or all three conditions, collisions will be prevented from future occurrence. For instance, by building an overpass, no intersecting traffic at the same elevation level is possible. Therefore, any of these conditions meet the definition of a cause (yellow shaded blocks in Figure 1). Obviously, to get to the fundamental level (root) cause, one can dig deeper, but this is not the point of the discussion.
Figure 1: Collision cause and effect map
On the other hand, the insufficient visibility condition is of a different nature. If true, it physically affects collision by increasing its likelihood. Improved visibility alone may only increase the likelihood of detecting the train earlier, however, it can’t reliably eliminate the collision. Note that the formal logic in the map properly represents this situation.
The “OR” gate output (failure event, Point C) replicates the status of the causes output (Point A) and disregards the insufficient visibility status (Point B). When at least one cause is corrected, the collision is also corrected. The opposite is also true. Correcting the insufficient visibility condition (which makes Point B false) will have a correction effect on the failure event (Point C, by making it false) if and only if cause(s) is corrected (Point A status becomes false). When Point A is true (meaning none of the causes are corrected), correcting only the insufficient visibility condition won’t affect collision. Correcting insufficient visibility together with correcting a cause(s) will aggregate the elimination of collision efforts. This way, the map logic properly reflects the aforementioned differences in conditions. Refer to Table 1 for the truth table.
Since correcting the insufficient visibility condition won’t eliminate the failure in the future but only reduce its probability, it meets the definition of the contributing factor term (grey shaded block in Figure 1). On the other hand, building an overpass (thus making both cross-directional traffic and Point A status false) will physically guarantee avoidance of a collision. Of course, real life is not black and white. No one can guarantee with 100 percent certainty that, for instance, the absence of signal lights/automatic gates was a cause as opposed to a contributing factor. After all, some drivers may choose to break through the wooden gates. In the end, the fact is, it is up to the analyst to qualify a condition. In this particular case, the automatic gates also could be qualified as a contributing factor, but the key point is a condition has to be qualified one way or another based on its ability to prevent a failure. As far as a corrective action is concerned for this particular example, the analyst may or may not suggest, for instance, installation of a more robust metal gate. Ultimately, the solution should be driven by process/equipment criticality.
Distinguishing a cause from contributing factor makes the cause and effect map more complex. To simplify it, one could connect all conditions - in this case, the three causes and contributing factor - with an “AND” gate and make a note that the visibility condition is a contributing factor. However this won’t be correct from the formal logic perspective. Remember that the cause and effect map is a concise representation of the analysis path performed in the narrative portion by applying formal logic. It will be even fair to say that the narrative has limited capability of describing complex, multi-branched logic relationships between the conditions, whereas the map offers a clear and accurate view. When looking at the map, one should be able to determine clearly as to which condition or combination of conditions can eliminate the problem and which ones can’t. As mentioned earlier, it is ultimately up to the stakeholders to decide which conditions they are willing to work on. If recourses are available, correcting a contributing factor also may be an option if proven to be cost effective.
Another example involving equipment operation will reinforce the approach expressed earlier. Picture a hypothetical system consisting of a relatively small waste tank with no upper level control. The large inlet pump is delivering Product #1 to the tank at a rate of 100 GPM, randomly when this product is available, and can’t be changed. The small inlet pump is delivering Product #2 at a rate of 5 GPM, which can be reduced if needed. The outlet pump is taking the waste from the tank at a rate of 70 GPM. When the large inlet pump is not delivering the product continuously, the outlet pump is capable of pumping the waste out of the tank, but there are times when the waste tank overflows, which then constitutes a failure.
In this example, the causes of an overflow are determined as follows (yellow shaded blocks in Figure 2):
- Outlet pump flow at 70 GPM is low,
- No upper level control.
The formal logic reflecting the relationships between the conditions is shown in the form of a cause and effect map in Figure 2. In this case, no detailed explanation is required due to its simplicity, but in a more complex case, the narrative should contain a discussion on conditions and provide evidence, while the map should allow clear visualization of the conditions and their relationship.
Figure 2: Tank overflow cause and effect map
The conditions in Figure 2 qualify as causes because correcting any of them or both, for instance, increasing the outlet pump flow rate to 105 GPM, will definitely prevent overflowing. On the other hand, the small inlet pump (grey shaded block) will only increase the likelihood of overflowing when working together with the large inlet pump. Reducing the small inlet pump flow rate won’t guarantee elimination of the overflowing. Thus, it constitutes a contributing factor. Formal logic of the cause and effect map is clearly reflecting this fact, again, by employing a combination of “AND” and “OR” gates. As in the previous example, the formal logic demonstrates that correcting either one or both causes will eliminate tank overflow, whereas the small inlet pump flow reduction by itself can’t eliminate the failure and will only reduce the likelihood of overflowing.
It is important to note that distinguishing between a cause and a contributing factor, as defined in this article, is a valuable concept from a corrective action perspective. The analyst must demonstrate which condition will definitively or most likely prevent the failure as opposed to which may only somewhat reduce the likelihood or consequences of a failure. The difference between a cause and a contributing factor should be properly reflected with formal logic in a cause and effect map.
Disclaimer: The views contained herein are the author’s and are not attributable to the author’s employer.
David Gluzman has over 25 years of experience in all areas of Reliability Maintenance Engineering, working in various industries. He is a Certified Reliability Engineer from ASQ, Certified Vibration Analyst Category IV from Vibration Institute, and holder of other CBM certifications. He is also an author of multiple publications in various trade magazines.