Level Up Your Reliability Skills: Get Certified! Boost your career now!

Elevate your industry profile at The RELIABILITY Conference.

Sign Up

Please use your business email address if applicable

Spare Parts Inventory: An Exercise in Risk Management

What do your car insurance, creating backups of your computer files, and spare parts inventory have in common? They are each an example of risk management. According to Wikipedia, risk management is: the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of events.

So just like your car insurance is a way to minimize the financial consequence of a car crash, backing up your files is a way to minimize the consequence of a computer crash and spare parts inventory is a way to minimize the consequence of an equipment crash. It’s all about risk management.

Risk Management Functions

There are four basic functions to manage risk - risk identification, risk quantification, risk probability, and risk response. The first three, identification, quantification and probability, are sometimes grouped together under Risk Analysis or Risk Assessment. With these functions completed, the last is to exercise risk vigilance. Risk vigilance is simply the recognition of the risk conditions, the ongoing response to the risk conditions on the ground, and implementation of the appropriate risk responses. Vigilance requires you to identify an appropriate trigger and this defines the parameters for your vigilance. We say appropriate risk responses because different risks require different approaches. For example, you might have a smoke detector to monitor for the risk of fire and machine guards to manage the risk of injury.

Here is an example of the application of this thinking:

  • Risk identification: Failure of the bearings on a turbine is a risk.
  • Risk quantification: In the event of a failure, will anyone get hurt and how hurt?How much money will the event cost per hour or day of downtime? What if the repair takes a few weeks or months instead of a few hours or days?
  • Risk probability: What is the chance this risk will happen? Has it happened before? Does the manufacturer warn us about the risk? Do we have statistics about MTBF (Mean Time Between Failures) for the bearings in question?
  • Risk response: Can we eliminate or anticipate the failure? What parts will be needed? How costly is the kit? Can the risk be transferred to someone else (by using supply contracts or buying insurance)? Does the waiting time for the part introduce any unanticipated risk?
  • Risk vigilance: How do we organize our team and maintenance strategy so that an event becomes apparent quickly enough that we have time to respond? In addition to vigilance this aspect includes responding to changes in the character of the risks over the life span of the plant.

Risk Management Process

Now that we have identified the key functions, let’s put that into the context of the whole process. Each of the following steps is important and you must apply them in sequence. If you don’t have the data for any step that is a sign that your risk management is under-developed and a trigger to collect the required data.

Before proceeding you should consider the value of spares to which you apply his process. All risk management reviews require time and attention and for low values of spares holdings it may be a better choice to just stock the required spares. We suggest that at a minimum this assessment be carried out for all spares requiring an investment of more than $1,000.

Step 1: Identify the specific spare part that you will be considering for this process.

Step 2: Determine the criticality of the systems, machines, and processes in your plant. Usually we segment criticality into breakdowns that can shut the whole plant or stop distribution, breakdowns that take out a single line, and breakdowns that reduce output. You can also develop other criticality levels to suit your layout and plant design.

Step 3: Identify any significant safety and environmental risks. This should include risks from simple slips, trips and falls, all the way up to a safety or environmental catastrophe.

Step 4: Interview other stakeholders to identify risks (such as operations, engineering, and supply chain) and weigh the risks they see. This would include impacts on other operations downstream, disruptionsand delays to the supply chain etc.

Step 5: Convert the consequence of the risk into the potential financial impact based on the downtime effect and other losses.

Step 6: Using the matrix in Table 1 identify for each category the score for the consequence in the event that the spare part is not available when required. Note that this does not automatically assume that the part should be in stock. Purchasing a spare should be considered only as the last resort, firstly consider the following:

  • Can you repair the failed item in a suitable time frame?
  • Can you use an alternative item?
  • Can you delay replacementuntil the vendor delivers the spare?
  • Can you control the plant/process without the part for the lead time for delivery?

If the answer is yes to any of the above questions then follow that process first and document any required controlling measures.

If more than one category applies, use the highest Consequence Score.

Step 7: Using the matrix in Table 2 identify the probability score based on the expected frequency or potential for failure. Please note that by its nature this will be an estimate based on your current maintenance/engineering understanding.

Step 8: Use the decision matrix in Table 3 to determine the required course of action.

In most cases this matrix will indicate whether you should or should not purchase the spare. What this process cannot do is tell you how many to purchase. To use this risk management approach for different spare parts holding levels just re-run the process based on that holding level but remember that justifying more than one spare requires that for the subsequent spares your time frame for failure is limited to the lead time for re-stocking as this is the period of risk exposure.

In approximately one quarter of situations the matrix does not make a definite suggestion and leaves the decision to a judgment call. This reflects the borderline cases where specific company local knowledge will inform the answer.

Risk Management Options

In all cases of risk management there are four options for the management of the risk. As you evaluate each risk, you need to then adopt a management strategy based on the chosen option adopted for that risk or class of risks. The risk management options are presented here in the order in which they should be considered.

  1. Avoid the risk - One way to avoid risk is to re-design the work. In many circumstances, this might involve reengineering, choosing long lived assets, or even replacing the asset. The best way to avoid the risk of an iatrogenic failure (failure caused by the mechanic or electrician) is to design the system to not break down! Of course that is tough but improvements in reliability that are based on equipment design are made every day. If you can’t eliminate the risk the next step is to mitigate it.
  2. Mitigate the risk - Mitigation involves reducing the probability of the risk happening (using existing technology instead of new technology) or reducing the consequence of the risk or some combination of both. For example, in the aircraft industry the risk of incorrect repairs has both safety and economic consequences. The industry mitigates this risk through rigorous repair procedures, certification of operators and mechanics, and close-in inspection. While these actions mitigate the risk they do not eliminate it. In an industrial situation, one way to minimize the consequence of a breakdown risk is to have backup systems in place.
  3. Insure the risk - Insurance is a form of risk mitigation in that it minimizes the consequence of the risk. It is included here as a separate option because the key is to shift the financial impact of the risk from you to the insurer. Here are some common types of insurance:
    1. Fire insurance for fires
    2. Liability insurance for accidents to visitors
    3. Workmen’s compensation insurance for employee injuries
    4. Business continuity insurance to cover catastrophic interruptions to business activity.
  4. Accept the risk - You decide that the risk probability or consequence is sufficiently low that you can handle it without help or additional systems. Sometimes this is referred to as ‘self insurance’. An example of this is companies with large vehicle fleets that don’t take out external insurance. They accept that they will need to repair/replace vehicles involved in an accident on the basis that in the long run this is less expensive (because of the large fleet) than the insurance.
Table 1: Risk Consequence Matrix
Consequence CategoryConsequence Score for Non Availability When Required
12345
Safety (potential for injury should the plant operate without the spare part)No Safety ConcernsManageable safety issues Use of temporary safety procedures  Genuine potential for injury
Environment

Minor Leakages Relatively easy to clean.

Not noticeable to the public/media.

Significant Leakages

Some clean up costs and operational inconvenience.

Significant Pollution

Significant clean up costs.
High likelihood of EPA notification.
May attract public/media attention.

Major Environmental event
Significant clean up costs
High likelihood of EPA fine or action
Will attract public/media attention

Major Environmental Event
Major release of pollutants.
Public/media concern.
Company reputation damaged.

Quality

Minor Product Defect

Minor process control adjustments required.

Significant Product Defect

Significant process changes required.

Serious Product Defect

Defect localized to batch or product.

Major Product Defect

Scrapping of large batches of product.

Total Product Defect

Total product recall.

Finance/Business Impact$0 - $999$1,000 - $9,999$10,000 - $99,999$100,000 - $1M>$1M
Table 2: Probability Score Matrix
Probability Score
12345
+5 years to failure3 -5 Years to Failure1 – 3 Years to FailureCould fail in next 12 monthsHas failed or will fail in next 12 months
Table 3: Decision Matrix
Probability ScoreConsequence Score
12345
1Do not PurchaseDo not PurchaseDo not PurchaseSnrMgr DecisionSnrMgr Decision
2Do not PurchaseDo not PurchaseDo not PurchaseSnrMgr DecisionPurchase Spare
3Do not PurchaseDo not PurchaseSnrMgr DecisionPurchase SparePurchase Spare
4SnrMgr DecisionSnrMgr DecisionPurchase SparePurchase SparePurchase Spare
5SnrMgr DecisionPurchase SparePurchase SparePurchase Spare

Purchase Spare


Spare Parts Inventory- exercise in Risk management

What is the risk in risk management?

In this part we will look closely as the concept of risk and managing risk in a maintenance spares storeroom.

Understanding the Consequences of a Breakdown

There are several consequences of breakdown and they are usually classified as safety, environmental and/or economic. Some breakdowns, like Bhopal India’s MIC leak that killed upwards of 2,500 civilians and wounded over 100,000 others, are completely unacceptable at any cost. Others such as the battery fires experienced during the roll out of the Boeing 787 are unacceptable from both a safety and loss of asset points of view.

In almost all cases, after the safety and environmental consequences are evaluated and eliminated, as much as possible, all the subsequent consequences are really variations on economic themes. Here are some examples of the cost of downtime:

Power plant downtime$160,000 hour
Oil Refinery 400,000/ b/d$100,000 hour (just refiner’s margin)
Automobile assembly line$500,000 hour
Cigarette manufacturing$240,000 hour

From a risk management point of view the evaluation of consequence goes like this: With the right parts in stock (or otherwise available) it may take (for example) 2 days to put a failed power plant back on-line. Without the part available it takes 4 weeks (lead time) and 2 days (for the repair). If the kit of parts to do the repair is $100,000 is it worth it to stock? The answer really depends on the probability. Would you pay $100,000 as an insurance premium against that particular failure? Is it worth it even if we never have the failure?

Having the right parts helps keep the consequences of breakdown to a manageable level. This is just like the thinking you use for insurance. You contract with a company to pay the premium to have them shoulder the consequences you are unwilling to take. Of course if you are will to take the risk then you don’t insure it! These questions and considerations are at the heart of risk management.

Investment vs. Risk Management

Despite the fact that an inventory of spare parts costs a good amount of money, and that accountants classify your inventory as an asset, and that some spare parts have been increasing in cost faster than inflation, they are still not a great investment. Some parts become obsolete before we can use them, some parts get damaged, and some even go bad. To add insult to injury, we can buy a part today and not collect the return on investment (ROI), by using the part, for many years. With other investments we like to see a ROI of 30-50%, starting immediately.

So, why would any business, that is supposed to operate in a way to make a profit, ever want to make that investment? The answer is risk management and to understand that better we need to look at the real function of maintenance spare parts and understand why we might inventory them in the first place.

When we dissect the spare parts use it is clear that there are only two reasons to hold a supply of spare parts. But more on that in a minute.

First, here is one thing that you need to know before we go ahead: Having the part available does not necessarily mean that you are stocking it. In all the scenarios that follow having the part available is the issue, not owning the part. It is critical that this is understood.

There are a couple of well used strategies that you can use to have a part available without having it in stock:

  • The most common internal method is for a plant to share the part with other plants (belonging to the same company) with similar equipment. The unusual variation is where factories of different companies share some parts. This is more common in regions with a lot of specific activity such as mining in Western Australia or carpet manufacturing in Dalton, Georgia.
  • The most common external method is to make an agreement with a vendor to supply a part within X number of hours. Variations include consignment stock where the part is kept at your warehouse but is owned by the vendor until you use it.

Big and Little Reasons to Hold a Supply of Spare Parts

The little reasons

Sometimes vendor packaging creates inventory – meaning that you can’t always buy the exact quantity you need. You want six, it sells in eights, and bingo you’ve got inventory! Sometimes we can obtain parts more cheaply if we buy them in economic quantities – just be sure that you use them all. Sometimes we need to purchase in advance because maintenance workers are more productive if they have all the parts required before starting work.

The big reason

Machines occasionally fail. In spite of intense PM (preventive maintenance inspection) scrutiny we miss the symptoms and the asset fails. In some cases the consequence is small and easily manageable (both practically and financially). We might have a warehouse full of product, a full distribution chain, a sister plant that can take up the slack or we are not sold out of capacity and can make up the production. But if the consequences of waiting to put the asset back into service are dire, expensive, disruptive, harmful, disruptive to a customer, or dangerous then we must do what is in our power to manage that potential risk. Often that means ensuring access to spare parts so that we don’t have to wait an extended length of time before commencing the repair.

This means that the real reason that you need to stock the part in inventory is that you can’t reliably get the supply within your planning horizon. The planning horizon might be zero for break downs, or a month with condition monitoring, or six months for a major PM but you only hold stock because you can’t get it within the time that your planning allows. Of course, if you do no planning than you need to hold lots of stock!

The Impact of Your Maintenance Policy

It is important to understand that the chosen maintenance policy for different classes of assets does drive parts usage. The maintenance policy is the strategy chosen to deal with the service and repair requirements of the various assets. Strategy’s might include using a contractor to take care of it completely (such as your elevators and HVAC) or where you just replace but do not attempt rebuilds in-house (such as transmissions in a heavy duty truck shop) or where you do all minor work and the vendor/contractor gets any heavy work (car rental company).

Each maintenance policy determines the need to carry the parts in your own inventory(assuming that the previous risk assessment indicated that you should stock parts). Table 4 shows some examples.

Table 4: The Parts Stocking Effect of Different Maintenance Strategies
Description of strategyExampleEffect on parts stocking
Contractor takes care of asset completelyFire safety, escalators, complex and sensitive equipment like turbines, generatorsNo parts, few parts1
Replace whole components but do not attempt repair or rebuildsCircuit boards inside machines, truck repair, gear boxes, motorsNo parts, just stock completed units or make deal with rebuilder to supply requirements within 24 hours2
All minor work done in-house and the vendor/contractor gets any heavy workCar rental, satellite facilities where there is not a full local crew.Minor wear parts like filters, belts, hoses, etc.
All work done in-houseTypical factory maintenance department on important assetsLots of stock

Notes:

  1. You might hold some parts as an insurance policy against the contractor making a mistake but if they provide all service (such as a contract with Siemens on a Turbine) you might not have the expertise to choose the right parts. Part of your contract is for Siemens to stock certain parts in your location or nearby.
  2. While it is true that in a factory there are plenty of motors, cylinders, gear boxes, the number of SKUs is smaller if we stock just the finished units rather than the parts to rebuild or repair all those items.

Eliminate the Condition and You Eliminate the Risk

It is good practice (and required by law in the US) to inspect slings, chains, and other lifting gear every day. This practice minimizes the probability of a failure. Another practice is encompassed by good rigging techniques that examine the center of gravity, weight, and material being lifted, and rigs each lift properly. A third practice is the clearing of lift paths so that if the lifting gear does fail, no one will get hurt.

Each of these practices mitigates a specific element of the overall the risk but doesn’t eliminate it. With the lowered level of risk, the process owner can, in good conscience, accept the small probability of a failure. If the job can be done efficiently without lifts at all then the risk has been eliminated.

The rule of risk management is; if it is possible to eliminate the condition then the related risk is also eliminated. This approach applies particularly to safety and environmental hazards. When you eliminate a risk, of course, be sure you are not introducing a risk that is worse.

What if the consequences of the part falling are truly catastrophic? What if the lift involves a giant tank of poisonous gas or a nuclear core? I’ll bet that the lift planner will go through additional steps to lower the likelihood of failure 100 more times!

Emotionally Driven Inventory is Not the Solution

There is such a thing as too much risk coverage. For example, you can have insurance that covers all medical costs from the first dollar. People might congratulate you on your choice but in fact this is over-insurance because it is always cheaper to cover the small risks yourself (by not insuring them) than it is to cover everything. This is sometimes referred to as an excess on your insurance and is an example of ‘self insurance’.

With spare parts inventory, having every single part in adequate quantities to ensure no possible stock outs ever is overly expensive, takes up too much room, and is inefficient. To have the right amount of stock we must understand the consequences of not having the partbut also the probability that we’ll need it. As mentioned previously the probability of the requirement for a second spare is limited to the probability of a failure during the lead time in which you can restock the first spare, not the probability of failure during the remaining life of the equipment. Understanding this simple step of logic could save your company from holding thousands (or even millions) of dollars in unnecessary spare parts.

Managing Breakdown Risk

Since a breakdown can disable a whole plant or other asset we need tools to detect when they are going to fail (to give us the most time possible) and techniques to extend the useful life of the asset.

It turns out we already have some powerful tools to manage breakdown risk.

Our first line of defense is our quality operators and drivers. These people are the DEW line (Distant Early Warning line was a cold war line of radar stations that could detect Russian missiles coming over the pole) for your equipment. Properly trained and motivated they can report abnormal sounds, vibrations and operations. They also can perform essential safety checks and basic maintenance to insure long life of equipment.

Our second line of defense is our quality maintenance and PM system. By doing basic maintenance (such as TLC- Tighten, Lubricate, Clean) we know the asset will last longer. Often we can make an asset last long enough for our needs. The skilled mechanics have dozens of years of experience looking at equipment and catching subtle signs of impending failures. Their inspections tell us what is happening and more importantly, what will happen to the asset.

The third line of defense is a well-designed PdM (Predictive Maintenance) system. This includes all kinds of instruments, gauges, sensors, computers and other high tech gear that allow us to see inside the equipment. The computers can talk to us about what is happening, the scanners can see heat, hear high frequency squealing or feel subtle vibration.

The fourth line of defense is the skills of the mechanics coupled with the right tools and the spare parts.

Think of Your Inventory as a Kind of Insurance Policy

When you look at your spare parts inventory imagine that you are looking at shelves of very specific insurance policies. Each inventory item that you purchase is a way of mitigating the consequence of failure of the part in operation. Now ask yourself: How often your boss comes to your office and insist that you had better use 10% of that insurance. Of course that never happens.

If you successfully perform risk management on insurance policies you can achieve two outcomes:

  1. The least cost for insurance by having as little as is reasonably possible, and
  2. Minimal consequences for the organization should any risks that the insurance covers come to pass.

With insurance you want to cover only what you need,and can afford, and nothing more. It’s the same with your spare parts inventories.

Of course, with your spare parts it is important to cover the risk but it is also important to be able to justify everything on the shelf – holding only what you need, and can justify, and nothing more. That is the key to effective spare parts risk management.

Keep reading...Show less