FREE copy of the Uptime Elements Implementation Guide once you subscribe to Reliability Weekly

Operational equipment reliability, and the resulting plant uptime, are inversely linked to the number of risks you allow your equipment and machinery to suffer. The contrary connection between equipment risk and reliability is not obvious, but it reveals itself to us when the risk equation is divided into its fundamental elements.

We start by examining the most commonly used form of the risk equation:
Risk ($/yr) = Consequence of Occurrence ($) x Frequency of Occurrence (/yr)

The equation says that risk is equal to the cost of a failure event multiplied by the frequency of the event. 

The Frequency of Occurrence divides further, so the full form of the risk equation becomes: Risk ($/yr) = Consequence ($) x [No. of Opportunities to Fail (/yr) x Chance of a Failure]

The Number of Opportunities to Fail is how many times a year a situation arises that could lead to a failure event. The Chance of a Failure is the odds that a failure will happen once there is an opportunity. Throw the two dice in Figure 1, and every throw is an opportunity to get one on each die, but the odds are 1 in 36 that it will actually happen in the next throw.

Figure 1

The Chance of Failure is one (1) if it will definitely fail every time the opportunity arises, and it is zero (0) if there will never be a failure when the situation arises. Chance uses values between 1 and 0 because the likelihood of a thing going wrong is usually possible to some degree. The chance of both dice being one is 0.0278-poor odds to bet on.

For operating plant and equipment the Chance of Occurrence of equipment failure becomes the Chance of Equipment Failure, which is the opposite of Equipment Reliability (the chance of not failing, i.e. the chance of success).

The reliability equation for equipment is:
Equipment Reliability = 1 - Chance of Equipment Failure
With a little manipulation, this becomes:
Chance of Equipment Failure = 1 - Equipment Reliability
Including equipment failure into the full risk equation, we get:
Risk ($/yr) = Consequence ($) x [No. of Opportunities to Fail (/yr) x {1 - Equipment Reliability}]

The full risk equation gives us massive insight into how we can maximize production equipment uptime. There is a direct inverse connection between equipment risk and equipment reliability. When equipment reliability is perfect (Reliability =1) the risk is zero, and if there are no opportunities to fail, there is also no risk (Opportunity = 0). If you want high equipment reliability, you must remove the possibility of a failure event arising in your machines and equipment.

Now that the connection between high equipment risk and low reliability is clear, we can make better operational and maintenance strategy choices.

Table 1

Impact of Equipment Risk on Maintenance Strategy

Risk is reduced by minimizing the consequence of an event or by reducing the frequency of an event. Which focus you chose to take as your key operational risk management strategy will be a major factor in your future production success. Table 1 shows a range of the common maintenance and reliability strategies divided into chance reduction strategies and consequence reduction strategies.

Consequence reduction strategies limit cost escalation by reacting to developing failure quickly. These strategies allow failure to start, and then you manage a problem so the least time, money, and effort is lost. They tolerate failure and loss as routine. They accept that it is only a matter of time before problems severely affect an operation.

Companies that use consequence reduction strategies minimize their losses by learning to fix problems and breakdowns fast and/or by doing lots of predictive maintenance to find embryonic failures. They hold many spare parts in store for insurance, set up a cache of parts by machines, train their repair people to fix things speedily, improve maintainability to do repairs faster, and have dedicated condition-monitoring groups looking at equipment for problems.

Minimizing risk by reducing its consequences means that you accept failure as normal. In an organization that mainly uses consequence failure management, its people wait for evidence of failures and then act. Reducing only the consequences of risk still makes work for everyone. This work never ends, because people and resources fix failures instead of removing failure causes so that there are fewer opportunities to have failures. In this way, a reactive culture is instilled in the organization.

Figure 2

The risk matrix of Figure 2 shows that reducing the consequences of an incident reduces risk since less money is lost-you move to the left on the matrix. That is the purpose of such things as emergency plans, fire brigades, and ambulances. If we react quickly, correctly, and early enough, the losses can be minimized.

The use of consequence reduction techniques on your equipment is an important risk control principle to contain costs, but it will not improve your reliability. Those activities that reduce failure consequence improve availability but do not improve reliability. You save some maintenance costs by preventing breakdowns, but there will be much frantic activity and "fire-fighting." For reliability improvement, you must reduce the frequency of failure; you must remove the chance of failure happening.

The alternate equipment risk management strategy we can apply is to use chance reduction techniques. Fewer failure incidents occur because chance reduction stops failure opportunities from starting. The risk matrix shows that chance reduction strategies lead to fewer failure events; reliability improves because you reduce the frequency of failure. The number of incidents fall over time. If failures drop from once a quarter to once a year to once every two years to once every five years, you have created reliability. On the risk matrix, reliability improvement moves you down the table.

Chance reduction strategies focus on identifying potential problems and making business system changes to prevent or remove the prospect of failure. The chance reduction strategies view failure as avoidable and preventable. These methodologies rely heavily on improving business processes rather than improving failure detection methods. They expend time, money, and effort to identify and stop problems so that the chance of failure is minimized.

The maintenance activities that pay-off the most are those that reduce frequency of a failure event. Stop an equipment risk incident from happening, and the equipment failure event cannot occur. If a maintenance activity does not reduce equipment risk, it is a waste of time, money, and effort. When you reduce failure frequency you automatically increase equipment reliability. With high reliability comes high availability, high throughput, and low maintenance costs.

You cannot expect to move more than a cell to the left on the risk matrix by using consequence reduction strategies. Your costs might halve, or even drop to a quarter, if you get good at spotting and managing impending failures, but when using frequency reduction strategies, you can easily move down many cells, bringing you a reduction in risk of up to hundreds of times. Consequence reduction strategies cannot achieve that amount of improvement. The use of chance reduction techniques should be your prime means of equipment risk control because they will give you both large maintenance cost reductions and far higher equipment reliability.

Both equipment risk reduction philosophies are necessary for optimal protection, but a business with a chance reduction focus will proactively prevent defects, unlike one with a consequence reduction focus that will find and fix failures early. Those organizations that primarily apply chance reduction strategies have truly set up their business to ensure decreasing numbers of failures, as a consequence they get outstanding equipment reliability and reap all the wonderful business performance that world-class reliability brings.

It is in your organization's best interest, and it will generate the most profit consistently for the least amount of work, to focus strongly on the use of chance reduction strategies. Consequence reduction strategies are still important and necessary-once a failure sequence has initiated, you must find it quickly, address it, and minimize its effects so you lose the least amount of money. But consequence reduction will not take your organization to world-class success and profit, because it expends resources. Only chance reduction strategies reduce the need for resources, because they proactively eliminate failure incidents through defect elimination and failure prevention that removes the opportunity for failures to start.

Mike Sondalini

Mike Sondalini has been in engineering and maintenance since 1974. Mike's career extends across original equipment manufacturing, beverage production, steel fabrication, industrial chemical manufacturing, quality management, project management, industrial asset management, and industrial training. His specialty is helping capital equipment-intensive companies build sound business risk management practices, introduce world-class lean practices, develop ultra-high reliable enterprise asset management systems, and instill the precision maintenance skills needed to continually improve plant uptime.

Upcoming Events

August 8 - August 10, 2023

Maximo World 2023

View all Events
80% of newsletter subscribers report finding something used to improve their jobs on a regular basis.
Subscribers get exclusive content. Just released...MRO Best Practices Special Report - a $399 value!
Defect Elimination in the context of Uptime Elements

Defect Elimination means a lot of things to a lot of people. Uptime Elements offers a specific context for defect elimination [DE] as a success factor on the reliability journey [RJ].

Internet of Things Vendors Disrupting the Asset Condition Management Domain at IMC-2022

Internet of Things Vendors Disrupting the Asset Condition Management Domain at IMC-2022 The 36th International Maintenance Conference collocated with the RELIABILITY 4.0 Digital Transformation Conference [East]

Asset Management Technology

The aim of the Asset Management technology domain is to assure that IT/OT systems are focused on creating the value from the assets and that the business can deliver to achieve organizational objectives as informed by risk.


TRIRIGAWORLD AWARDS honors excellence in space optimization and facility management, A event to further advance asset management

IMC-2022 Who's Who: The World's Best Run Companies

The International Maintenance Conference (IMC) provides a fresh, positive community-based curated experience to gain knowledge and a positive perspective for advancing reliability and asset management through people, their managers, the strategy, the processes, the data and the technology. The world’s best-run companies are connecting the workforce, management, assets and data to automate asset knowledge that can be leveraged for huge beneficial decisions.

Uptime Elements Root Cause Analysis

Root Cause Analysis is a problem solving method. Professionals who are competent in Root Cause Analysis for problem solving are in high demand.

Reliability Risk Meter

The asset is not concerned with the management decision. The asset responds to physics

Why Reliability Leadership?

If you do not manage reliability culture, it manages you, and you may not even be aware of the extent to which this is happening!

Asset Condition Management versus Asset Health Index

Confusion abounds in language. Have you thought through the constraints of using the language of Asset Health?

Seven Chakras of Asset Management by Terrence O'Hanlon

The seven major asset management chakras run cross-functionally from the specification and design of assets through the asset lifecycle to the decommissioning and disposal of the asset connected through technology