FREE: Introduction to Uptime Elements Reliability Framework and Asset Management System

by Will McGinnis

Attempting to contextualize generally obscure statistical predictions about machines is an all too common practice. While statistics and predictive analytics are not new concepts, they have not yet been applied deeply enough to mechanical processes to be embedded in the vernacular. With a focus on the actionability of analysis, similes help to convey the messages.
This article presents one of them.

Machines As Patients

In terms of failure, the primary difference between biological and mechanical systems is, for the most part, that small variations strengthen the biological and weaken the mechanical. A biological system dropped into an environment for which it is ill suited will eventually adapt to it, while the mechanical will simply fail. Despite this fundamental difference, they can be approached the same on the doctor's table.


People and machines both get sick. The chronic propensity to fail can be seen in both terminally ill patients and mis-sized or improperly maintained machinery. Both cases are characterized by a likelihood of future failure that is independent of short-term variation in environment,
behavior or process.


Likewise, both people and machines can be injured. With a person, this could be a broken arm. In most cases, it is a single, unexpected event that is unlikely to be repeated unless the environment consistently reproduces those events (e.g., a professional skateboarder). The mechanical equivalent would be an operator's error. A broken part due to operator error is not necessarily an indication of a long-term misapplication of the machine, but rather an isolated event caused by outside factors. The only ways to avoid injuries are to change the environment or improve the process (e.g., quit skateboarding or train more to fall less).

Sickness vs. Injury

With these similes, it can be pretty simple to visualize and understand the cases of mechanical failure. Sickness is environmentally independent and chronic, while injuries are heavily influenced by the environment and are acute. A machine can be both sick and injured, or just one of the two.

Measurements for both machine sickness (HealthScore) and injury (Warnings) can be seen in Table 1.

Table 1

There are four assets with the four boundary cases of condition:

  1. Total Health (004)
  2. Injured, but well (001)
  3. Intact, but sick (002)
  4. Sick and Injured (003)

The goal is a machine with both a high HealthScore and no warnings (predicted injuries). The opposite of this ideal is a sick and injured machine, with both a low HealthScore, indicating chronic risk of failure, and a warning, indicating imminent likelihood of injury.

In between these two extremes are machines with a warning, but a high HealthScore, indicating an environmental risk but no systemic misapplication, and machines with no warning but low HealthScore, indicating a systemic misapplication that is being well accounted for by the environment (i.e., users and maintainers). These are both interesting cases because they can be easily rationalized as good results, but while they are far better than sick and injured, they can be greatly improved by changing the environment or system, respectively.

Figure 1 shows an example of how a population of assets might look visualized in this way.

Plants As Populations

After establishing this idea of machines as patients (i.e., biological-esque entities), the natural extension is to look at a plant as a population of such entities. Scholar Nassim Taleb describes all things as being along a scale from fragile to anti-fragile.

If something is fragile, then a small, random variation makes it weaker. Conversely, if something is anti-fragile, then small, random variations make it stronger. Right in the middle of the two is robust or resilient, where variations do not affect the item.

Individual machines are generally fragile by this definition. Likewise, individual people are, in many cases, fragile. A population of people, however, can be extremely anti-fragile, adapting over time to changes in the environment and gradually improving quality and longevity of life. The goal of a plant should be a population of machines that resembles a population of people or animals more so than a house of cards, where a collection of fragile items is more fragile than its constituent parts.


The first key concept that contributes to the anti-fragility of biologic populations is selection. Those animals or bacteria that are least fit for the environment do not continue on into the next generation. Likewise, to increase the anti-fragility of a plant, the fitness (health) of the machines needs to not only be tracked, but used to inform the replacement of machines. Machines that are chronically unfit (unhealthy) must be replaced with machines that are less predisposed to this condition. Where the machines themselves cannot be changed to be more fit for the environment, the environment must be changed.

Continuous, iterative improvement of both the conditions surrounding the machines in a plant and the fitness of the machines to that environment are critical in building the capacity for the plant to not only withstand unexpected variations, but to benefit from them.


The other critical contribution to a population's anti-fragility is compartmentalization. The lower the effect that one failure has on other members of the population, the more anti-fragile that population will be.

Taleb uses the examples of airlines and banks. If a plane accidentally crashes somewhere in the world, other planes are no more likely to crash because of it. In fact, the aviation industry will learn from the accident and improve its processes so future planes will be less likely to crash, the very definition of anti-fragility. A bank, on the other hand, is more likely to collapse if one of its peers collapses. The financial crisis of 2007 is testament to this fact; the interconnectedness of the global economies (lack of compartmentalization) makes the population of banks fragile.

For a plant, this means the goal should be to minimize the impact of failures on downstream processes as much as possible. Having described the idea of a plant as a population of entities with individual variations in health and injury, what can you actually do in your plant?

You want to have less downtime, lower maintenance costs and fewer accidents, and make more money. In a world of certain variation, you want your plant to have lower risk and to better weather storms. To do this, you must follow these steps:

  1. Compartmentalize failures,
  2. Predict machine injuries,
  3. Replace sick machines,
  4. Repeat.

It is an iterative process that reflects the constantly changing environment in which workers and their machines operate. By focusing on Steps 2 and 3, you can leverage preexisting data sources in your plant and use the power of predictive analytics to predict acute failures and quantify long-term machine health.

In closing, here is one more simile. If your plant is a population of sick and injured patients, how are you maintaining it? With predictive analytics, you have a means by which you can perform triage in order to build continuous improvement.

Keep reading...Show less

Upcoming Events

August 8 - August 10, 2023

Maximo World 2023

View all Events
80% of newsletter subscribers report finding something used to improve their jobs on a regular basis.
Subscribers get exclusive content. Just released...MRO Best Practices Special Report - a $399 value!
IMC-2022 Who's Who: The World's Best Run Companies

The International Maintenance Conference (IMC) provides a fresh, positive community-based curated experience to gain knowledge and a positive perspective for advancing reliability and asset management through people, their managers, the strategy, the processes, the data and the technology.

Uptime Elements Root Cause Analysis

Root Cause Analysis is a problem solving method. Professionals who are competent in Root Cause Analysis for problem solving are in high demand.

Reliability Risk Meter

The asset is not concerned with the management decision. The asset responds to physics

Why Reliability Leadership?

If you do not manage reliability culture, it manages you, and you may not even be aware of the extent to which this is happening!

Asset Condition Management versus Asset Health Index

Confusion abounds in language. Have you thought through the constraints of using the language of Asset Health?

Seven Chakras of Asset Management by Terrence O'Hanlon

The seven major asset management chakras run cross-functionally from the specification and design of assets through the asset lifecycle to the decommissioning and disposal of the asset connected through technology

Reliability Leader Fluid Cleanliness Pledge

Fluid Cleanliness is a Reliability Achievement Strategy as well as an asset life extension strategy

MaximoWorld 2022 Conference Austin Texas

Connect with leading maintenance professionals, reliability leaders and asset managers from the world's best-run companies who are driving digital reinvention.

“Steel-ing” Reliability in Alabama

A joint venture between two of the world’s largest steel companies inspired innovative approaches to maintenance reliability that incorporate the tools, technology and techniques of today. This article takes you on their journey.

Three Things You Need to Know About Capital Project Prioritization

“Why do you think these two projects rank so much higher in this method than the first method?” the facilitator asked the director of reliability.