Don't miss MaximoWorld 2024, the premier conference on AI for asset management!

Experience the future of asset management with cutting-edge AI at MaximoWorld 2024.

Sign Up

Please use your business email address if applicable

Tools for Improving Maintenance strategies and failure analysis processes

Predictive tools.

Reliability Centred Maintenance. RCM

Reliability Centred Maintenance is defined by John Moubray as "a process used to determine what must be done to ensure that the physical asset continues to fulfil its intended functions in its present operating context" (1993, pg.7). RCM was born from the airline industry in the US in the early 70's in response to the statutory maintenance requirements that had to be applied to larger aircraft such as Boeings 747. It was determined that the cost of applying the standards to these aircraft would make them uneconomical to operate (Smith and Hinchcliffe, 2004). The basis of RCM is to ensure equipment maintains its function and the process requires that the following seven questions be answered (Moubray, 1993).

1. What is the function of the equipment and what are the required performance standards?
2. In what ways can it fail to perform its function?
3. What could cause each functional failure?
4. What happens when the failure occurs?
5. In what way does the failure matter?
6. What can be done to prevent the failure?
7. What has to be done if the failure can't be prevented?

Smith and Mobley (2008) highlight the following types of asset management strategies that may be developed from an RCM process.

1. Condition based tasks. E.g. Oil is sampled from a transformer and the results of the analysis determine if further maintenance is required.
2. Scheduled restoration. E.g. A Sheave bank running in a corrosive environment that requires overhaul at fixed intervals.
3. Scheduled Discard. E.g. The replacement of oil in a combustion engine.
4. Failure finding task. E.g. Calibration of instrumentation. The fault may not be discovered until the calibration is done.
5. One-time change. Typically a one off redesign.

RCM in its purest form is a resource hungry process that should only be applied to the most critical of assets. The results from the process if performed properly and coupled with assessment of historical failures will produce efficient and effective maintenance strategies, but this will be at the expense of a significant amount of time for plant staff and the project analyst.

Failure Modes and Effects Analysis. FMEA.

A Failure Mode and Effects Analysis is an integral part of the RCM process and deals with questions 2, 3 and 4 of the 7 RCM questions listed above. Teng and Ho (1996) define FMEA as a technique that identifies the potential failure modes of a device or product, determines the effects of these failures and assesses the criticality of the failure. The Teng and Ho model is shown in figure 1.

Fig 1

Fig 1. FMEA flowchart.

An FMEA completed on DC machines in an Australian Steel mill revealed the following most likely causes of DC machine failure to be:

1. Contamination of motor by Dust, Dirt fumes etc.
2. Inadequate maintenance practices. (Internal and contract)
3. Inadequate brush tension
4. Over tensioning of belts or shaft misalignment.
5. Overheating due to ineffective ventilation
6. Neutral axis and compounding issues.
7. Overloading.
8. Inadequate lubrication (Too much or not enough).
9. Incorrect or ineffective protection devices.

These findings were used to improve the existing PM's with excellent results. Over a 3 year period there was a 70% reduction in DC motors that failed in service.

Planned Maintenance Optimisation. PMO

Planned Maintenance Optimisation is a process where existing PM inspections  and failure history are used to form the basis of a new set of strategies. This can provide a similar output to classical RCM in far less time. As unknown failure modes are not addressed in the first instance the process allows for input of potential failure modes after the initial assessment. This process couples the PMO top down approach with the RCM bottom up approach and in many cases will be the best option for mature businesses with existing PM systems and access to failure history. New businesses with no existing systems or failure history will need to apply more classical methods such as a RCM or a knowledge based process.

Event trees and Fault trees.

Event and fault trees are not aimed at determining root cause, but are meant to determine the probability of an event occurring. From the probability rating you can then determine which parts of a system require attention.

Fig 2

Fig 2. Example of an event tree.

Fig 3

Fig 3. Example of a fault tree.

Problem solving tools.


Smith and Mobley (2008, pg., 79) define Root Cause Analysis as "The systematic evaluation of problems to find the basic causes that, when corrected, prevent or significantly reduce the likelihood of recurrence."

Latino (2006, pg., 3) promotes the steps to "Root Cause Analysis" as being:

1. Identification of the actual problem.
2. Identify the cause and effects that combine to cause the undesirable event.
3. Data collection to support the cause and effect relationship.
4. Identification of physical, human and latent (System) causes that are associated with the undesirable event.
5. Development of corrective actions to prevent the re-occurrence of the problem.
6. Communication of the lessons learned to relevant areas in the organisation.

Latino discusses many other forms of problem analysis, which he classifies as "shallow analysis" because not all of the steps are completed as listed above. Figure 4. compares suggested shallow analysis processes with RCA to highlight the differences.

Fig 4

Fig. 4. Comparison of RCA to "shallow analysis" processes.

5-Why analysis.

The 5 why process is an integral part of "Kaizen" in the Toyota Production System, (Liker, 2004) and the Lean manufacturing philosophy. The process is based on the assumption that if you ask "why" five times to a specific issue you will determine the root cause of the problem. Actions are then to be put in place to eliminate the root cause.

Latino and Latino (2006) suggest that the 5 "why's" should  been changed to "how could", as "why" can imply that there is only one answer. "How could" suggests that there could be numerous reasons as to why the problem occurred.  Latino and Latino also suggest that the 5-why approach is often used by people in isolation and is rarely backed up with evidence. This could lead to answers that do not address the root causes.

Practical Problem Solving (PPS)

Practical Problem solving is an extension of the 5 why process in that it adds steps to either side of this process. The process is defined by Liker (2004) as follows:

1. Initial problem perception.
2. Clarify the Problem.
3. Locate the point of cause of the problem.
4. Use 5 why to find the root cause from the direct causes.
5. Determine countermeasures to eliminate the problem.
6. Evaluate whether the countermeasures were effective.
7. Standardise the process.

PPS is used on a daily basis in businesses that apply LEAN manufacturing philosophies. It is a simple process that delivers actions that lead to continuous improvement in all aspects of manufacturing.

Cause and effect diagrams. (Ishikawa)

The cause and effect diagram was developed in the 60's and was the brainchild of Kaoru Ishikawa. The diagram is fundamentally a brainstorming tool that clusters possible causes of a problem into broad headings. These causes are then assessed to determine the most likely causes of the problem so solutions can be developed.  The cause and effect diagram is useful to determine "what could" have caused a problem. This tool can be considered as "shallow analysis" and should not be seen as a complete tool for use on complex problems.

Fig 5

Fig 5. Cause and effect diagram.


SCRA (Symptom, Cause, Remedy, Action.) The system contains four steps for improvement and introduces a number of tools that can be used through each step of the process as detailed below:

Symptom. Define the problem, Measure the problem, prove the need, set the goal.

Tools that can be used: Run charts, surveys, interviews, flow charts, paretos, check sheets, histograms, box plots.

Cause. Collect the data, analyse it, define and test possible causes, determine the key root causes, determine the improvement avaliable.

Tools that can be used: Check sheets, histogram, cause and effect diagram, 5 why's, Brainstorm etc.

Remedy. Formulate and evaluate solutions and choose the best solution.

Tools that can be used: Brainstorming, cause and effect, driver tree, FMEA, force field analysis, evaluation matrix, cost benefit analysis etc.

Action. Plan, Do, Check, Adjust and hold the gains made.

Tools that can be used: Action plan, implementation monitoring chart, KPI's, checklists, SOP's, control charts, audits etc.

The SCRA methodology is a conglomerate of a number of different but well-known improvement tools collated under a single heading.

Six Sigma.

Six Sigma is a set of practices that was developed by Motorola in the 80's and is closely linked to the TQM philosophy of involving all in the process of reducing variation and eliminating defects (Arnheiter E.D & Maleyeff J, 2005). The Six Sigma process steps are called  "DMAIC" and in relation to equipment reliability could be applied to maintenance in the following ways:

1. Define. Select and define appropriate projects that align with the needs of the business. This may be determining what equipment requires strategy re-development based on poor reliability.
2. Measure process variables, such as the MTBF and re-occurring failure modes of a piece of equipment.
3. Analyse the data gathered using graphical techniques to understand the causes of the failures.
4. Improve the assets reliability by applying continuous improvement techniques such RCM and RCA.
5. Control the improvements by implementing a good work management system and embedding follow up reporting in the system (Smith & Mobley 2008; Senapati, 2004).

Like SCRA, the six-sigma process utilises different tools, of which, many have been included in this review

Pareto analysis.

The Pareto principal, also known as the 80/20 rule, highlights that some things are more important than others (Latino and Latino, 2006). In relation to maintenance, an example of the rule could state: "80% of plant downtime applies to 20% of the installed equipment".  The significance of the 80/20 rule is that if the top 20% of losses can be identified and then eliminated improvements will be made in the shortest timeframe. This is an extremely powerful tool and arguably the most common method used to determine where improvements need to be focussed. It is a must in the toolkit of Maintenance Reliability professionals.

Failure reporting.

Formal failure reports are the traditional way of presenting investigations into failures and are generally an after- the-event communication exercise more than a tool to determine the failure cause. Typical headings are used within these reports are:

1. Project Title
2. Equipment hierarchical location.
3. Work order no.
4. Problem Statement.
5. Potential costs associated with the issue.
6. Observations.
7. Process followed. This includes collating data and analysing it to diagnose the cause of the problem.
8. Findings.
9. Are existing strategies in place to address the findings?
10. Conclusion.
11. Actions.
12. Information for further reference.

In Conclusion.

All of the tools discussed have their place in predicting or analysing failures, and this review is by no means comprehensive. Numerous other problem solving tools such as Brainstorming, checklists, flowcharts, 4Why2How etc. exist and this review process has shown that it is worthwhile having many of these tools available, as some suit situations better than others.

Of all the tools mentioned it is the Authors view that the Toyota based Practical Problem Solving tool will deliver the most consistent flow of maintenance improvements.


Arnheiter, Edward D. and Maleyeff, John. "The integration of lean management and Six Sigma".  The TQM Magazine, Vol.17 no. 1, 2005 pp.5-18. © Emerald Group Publishing Limited 0954-478X.

Liker, J.K 2004, "The Toyota Way. 14 management principles from the worlds greatest manufacturer."   McGraw-Hill publishing, New York.

Latino, R.J & Latino K.C. 2006,  "Root Cause Analysis, Improving Performance for Bottom-Line Results" Third edition. CRC Press, Taylor and Frances Group, FL, USA.

Moubray, J 1991 "Reliability-centred Maintenance" Butterworth-Heinemann Ltd, Oxford.

Smith, M. Hinchcliffe, Glen, R.  2004. "RCM, Gateway to world class maintenance." Elsevier Butterworth-Heinemann, MA,USA.

Smith, R & Mobley, R .K  2008, " Rules of thumb for Maintenance and Reliability Engineers." Butterworth-Heinemann, MA, USA.

Teng, Gary S. Ho,Michael 1996.  "Failure mode and effects analysis, An integrated approach for product design and process control." International Journal of Quality & Reliability Management, Vol. 13 No. 5, 1996, pp. 8-26, © MCB University Press, 0265-671X

Mark Brunner is Reliability and Systems Superintendent- Wire for OneSteel Rod Bar and Wire in Newcastle, NSW Australia.

ChatGPT with
Find Your Answers Fast