LCE Top Banner LCE Top Banner LCE Top Banner

Selecting the Correct Maintenance Strategy

Selecting the Correct Maintenance Strategy by Mike Johnston

Selecting the Correct Maintenance Strategy

by Mike Johnston

Determining the proper maintenance strategy for a site’s assets can be a daunting undertaking. There’s a fine line between profitability and reliability, and frequently, a facility’s strategy usually favors one or the other. When weighted toward running equipment past its design or capabilities, it can lead to frequent, unplanned interventions and associated costs of labor, material and lost production. When the arc of the pendulum swings too far to over maintaining an asset, the availability can be seriously hindered and impact profitability. It’s important to find that “sweet spot” between these two approaches to ensure there’s an appropriate amount of maintenance that still drives profitability. The question is how to find it.

Equipment in the refining and petrochemical industry (e.g., pumps, vessels and exchangers) is clearly different from the equipment in an automotive assembly line facility or in the food and bottling sector, which will affect the maintenance strategy decision. However, some equipment care methods are universal. This article breaks down the process to analyze, develop, agree and deploy a new or augmented maintenance strategy into five phases.

Phase 1

Establish a Team

To lay the groundwork for a cost-effective maintenance strategy, a team must be established if one does not already exist. A dedicated crew comprised of members of the reliability, maintenance, production and engineering groups needs to be convened. Having people from various departments will help bring a variety of perspectives to the table and help cover all bases.

This process cannot be regarded as a short-term, get it done quick event. Each team member must make a commitment to his or her role in the maintenance plan of action. To ensure an accurate and effective strategy is put into place, the multidisciplined team of stakeholders need a certain level of autonomy from management. By giving them more freedom, members of the maintenance team will be better equipped to think freely and execute work properly.

Phase 2

Critical Analysis

First, the team must establish an asset’s importance in the production chain by defining whether it is: 1) critical, 2) vital, or 3) secondary to production. This is a very crucial step and undoubtedly will be the first challenge to manage group consensus. Maintenance strategy team members may have conflicting opinions on what constitutes the most critical assets, depending on their perspective and respective positions within the organization. Fortunately, there are tools to help the team move beyond their opinions and make objective decisions.

Historical repair data from the site’s computerized maintenance management system (CMMS) and overall operation efficiency (OOE) information, if available, will help drive the analysis and assist in resolving divergent opinions. Based on the mean time between failure rate (MTBF/R), the most pressing issues can be identified and agreed upon. Whether these particular assets are critical or not will be determined as the exercise progresses. Note: Assets that have a spare are generally not defined as critical and could be removed from the initial list and examined later.

The criticality matrix (Figure 1) is a particularly useful tool to evaluate, categorize and prioritize an asset’s necessity. In the Figure 1 example, anything ranked 15 or higher would be categorized as critical. The middle rankings of 5 to 9 could be deemed vital, with the lower levels assigned as secondary.

Figure 1 Figure 1: Critical risk matrix

This matrix includes both the cost of maintenance and the cost associated with lost production. Although vital, lost production revenue is frequently overlooked when evaluating the urgency of maintenance work. This monetary value could be potentially much greater than the cost of maintenance labor and material, and therefore, should not be omitted from the ranking. The potential cost of lost productivity alone may place an asset in a higher bracket than initially classified. If a site already has a matrix, upgrading it may yield more accurate results. However, it is important to note that starting from scratch will add time to the progression of the work and delay defining the strategy to deploy.

Phase 3

Analysis of Current Strategy Versus Preventive Maintenance/Repair Data

Once the team has agreed to the preliminary criticality ranking, the next step is to evaluate the existing maintenance strategy for that particular asset, if there is any. In this initial review, gaps between the repairs and maintenance strategy are noted for additional investigation and future improvement. From this point, the team should decide on how to group together equipment for analysis. There are three ways to do this: 1) compile a comprehensive list of all data on all the assets at the site before moving to the next phase; 2) address each asset individually, conduct the gap analysis, define and deploy an altered strategy as a pilot and monitor the results; and 3) conduct the analysis in clusters, grouping together equipment from different departments.

Then, different processes can be applied to cross-check matrix results and confirm the initial rankings. For example, a failure mode and effects analysis (FMEA) provides a qualitative analysis to determine system reliability. A root cause analysis (RCA) or 5 Whys analysis also can be utilized by the team to help determine cause and effect relationships. Employing a range of tools can help identify potential failures, consequences, or circumstances not considered during the criticality ranking.

Consider the following challenges a refinery may face:

At this juncture, any existing maintenance strategy and any current controls for prevention and detection should be reevaluated. The historical and potential failures should be listed for the components, with their respective controls and strategy, to identify what is and isn’t working. Once this list is compiled, the team can progress to Phase 4.

Phase 4

Define/Create the New Strategy

Using the list created in Phase 3, the team embarks on what is possibly the most arduous task of the entire exercise — developing and agreeing to a new or altered maintenance strategy. At this point, the crew must decide what activities might increase reliability, productivity and overall equipment effectiveness (OEE) and reduce failure. There are five major avenues that can be explored and applied to arrive at a suitable maintenance strategy for a given piece of equipment and its components.


Preventive Maintenance (PM): Regularly performed standard repair, replacement, inspection, cleaning and lubrication.


Predictive Maintenance (PdM): Employing condition-based monitoring technologies, such as vibration analysis, thermography, tribology, acoustic analysis, wear particle analysis, or x-rays.


Proactive Maintenance: Applying the results of the data derived from PdM to preemptively drive the work at the opportune time, with FMEA/RCA conducted after any subsequent failure to determine the cause and undertake corrective action to reduce or eliminate possible recurrence.


Redesign/Enhance: Occasionally, there is a component of an asset that does not lend itself to maintenance easily, or at all. This may be an inconveniently located bearing or a component that is not sufficient to meet the demands placed on it. In this case, either an enhancement or a redesign of the item could be implemented, possibly working with the original equipment manufacturer.


Run to Failure: Do nothing, wait for failure to occur and then correct it.

The team must determine which of these methodologies, or a combination of them, should be used for the job. They will have to consider a variety of factors. For example, if a company needs to get new tools or the implementation is difficult, employees may need additional training, costing valuable time. A strategy’s cost-benefit, return on investment and length of time between application and anticipated improvement will all influence the decision. The ideal situation would be if a site can increase productivity and reliability without any production downtime.

Possible solutions to Phase 3 challenges:

Asset A: In the example of bearing failures, FMEA or RCA can determine the cause of premature malfunction. The reason could be anything, including poor design; incorrect, insufficient, or over lubrication; inappropriate lubricant; improper installation or misalignment; the wrong application of parts; or operating the unit past its designed envelope. Once the root cause has been identified, the appropriate strategy, redesign, or enhancement should be put in place. In this case, the solution could be improving a bearing pedestal that lacks the rigidity needed to dampen an inherent natural harmonic frequency.

Asset B: The gearbox lubricant oil replacement dispute is a prime candidate for a combination of tribology and vibration analysis. The strategy could be to replace the oil at the next calendar driven cycle, or even immediately, and then begin conducting oil analysis, starting with, perhaps, a quarterly frequency. A vibration analysis could be conducted to check for internal wear or damage. The vendor that supplies the oil for the gearbox may conduct an oil analysis as part of its services. Depending on the cost of the oil, the labor to replace it and the lost production while the unit is down for the oil change, this may be a very viable alternative to an automatic biannual oil replacement.

Asset C: Rather than disassemble a coupling for which no issues have been reported, a monthly vibration check could be performed with the motor and driven unit bearings to identify any misalignment via the axial readings of the vibration signature. An annual lubrication may still be required, but the downtime to perform this function would require less time and replacing the lubricant would only require the labor of one oiler, not two millwrights.

Asset D: The MCC in the example should be inspected using a thermography preventive maintenance technique. These checks can be performed with little to no effect on regular production, while providing a more accurate representation of the condition of the wiring, terminals and any other components contained in the MCC. Remedial action should only take place if the thermography indicates potential problems.

The most rarely used maintenance strategy is the run to failure option. This is generally applied when the costs of labor and material do not warrant any strategy. For example, a 1/4 HP motor with sealed bearings on a conveyor segment would run to failure. This motor likely would be one the site has in its stores or an item that local vendors have in stock. Very little is gained from performing maintenance on such a low priority component.

Before implementing a new maintenance strategy for a piece of equipment, the asset should be cleaned, lubricated and rebuilt beforehand so it goes into the next step performing at its optimal design capacity. Otherwise, the unit will, by and large, continue to require attention through unplanned outages and will not gain much through improved reliability. For example, performing tribology on a leaking gearbox or conducting a vibration analysis on bearings that are already exhausted and nearing failure would be a waste of time.

Regardless of which strategy is selected, all activities should be performed by the operators. Simple, autonomous PM tasks, such as cleaning, checks, inspection and lubrication, can be passed to the operator staff with minimal training. This approach frees up the trained maintenance staff to concentrate on more critical activities that may require a broader, more in-depth skill set.

Phase 5


Once the new strategy is in place, it must be monitored for effectiveness and completeness. If technological solutions, such as PdM, were implemented, a baseline must be acquired with the initial inspection. This provides the denominator to measure against going forward. If it appears the new strategy is not meeting expectations, additional analysis needs to be performed to determine the gaps for what is lacking in the current approach. This is the check-act portion of the Deming Cycle (Figure 2), which is frequently overlooked or disregarded, to ensure a dynamic, sensible approach is employed to extend the life of the various components that make up an asset. The team may need to go back to Phase 3 and work through the process to identify missing or ineffective methods to close any remaining gaps.

Figure 2 Figure 2: Deming Cycle of continuous improvement


There is no one size fits all strategy for effective maintenance. A certain level of flexibility is necessary to roll with the punches and embrace the appropriate strategy. Continuous improvement is an ongoing process. It requires monitoring, record keeping and updating to take place throughout the lifecycle of an asset and its components. Team members, circumstances and situations within a company may change over time and the maintenance strategy needs to keep up. Ensuring a proper, continual maintenance approach must be a high priority for maximized productivity and guaranteed optimal equipment performance. Employing the proper maintenance strategy that ensures the right work is performed at the right time with a minimally invasive methodology will help drive improvements in reliability, thereby stimulating increased productivity and the lifecycle of an asset.