The Errors in Availability

04 April 2011

Much has been written regarding the measuring and the uses of availability as it applies to management of the maintenance function. It is widely regarded as one of the central measures of maintenance effectiveness and is used in various other "advanced" performance measurement techniques.

These include the famed O.E.E measure and the relatively new area of availability modelling. Both of which are considered as powerful continuous improvement tools. This is based on the underlying belief that availability is an accurate measure of maintenance effectiveness. However this thinking is somewhat flawed, the effects of which change greatly the understanding of the true usefulness of this measure.

Availability is generally accepted as the amount of time that a piece of equipment can be used out of the total time that it is required.

Example 1

The example that we will use throughout this paper will be that of a 600l/minute pump in a hydrochloric acid circuit. In order to keep things simple we will say that the period we are measuring is a 10 hour period and that the pump was not available, due to failure, for a period of 1 hour.

Therefore the availability of our pump will be:

So the availability, as per our definition above, is 90%.

Whether this is a good measure or not is dependent on a variety of issues. In some cases it will be exceptional while in other cases it may be considered to be a woeful level of availability.

Error 1: Availability of what?

I first became aware of this issue while reading John Moubray's book Reliability-centred Maintenance 2, chapter 14, where it is explained with great effect. However for the sake of space I will use the simpler example of the pump given in the example above.

Something that we have learned from the work done in the area of RCM is that "equipment only fails when it is unable to fulfil its functions." And while this is a point that requires some explanation to those unfamiliar with the concept, it doesn't fall within the scope of this paper to explain it.

Therefore the primary function of our pump may be:

To pump hydrochloric acid from point A to point B as a rate of >= 600
l/minute.

In a standard measurement regime we often find that if the pump is not pumping at all we then consider it to be unavailable. However what happens if it is pumping at a rate of <600 l/minute? In some cases this would be measured as a period of unavailability, in other cases it would be measured as a period of partial or restricted availability. In yet other cases this would be registered as an available period. This is where we begin to experience some difficulties with this measure.

Another function of our pump, a secondary function, may be that of containing product. In this case the product is, as we have mentioned, hydrochloric acid. So what happens if we have a leak?

It is possible for us to have a leak and still achieve our required flow rate of >= 600 l/minute. As such the pump would be measured as available. The problem with this is that the pump is not doing what we, the users, require of it. Therefore it is not truly able to do the work we require to the level that we require of it.

So it begins to become obvious that availability, as it is commonly used, is a measure of the primary function only. And at time this doesn't include partial failures of the primary function. Yet a failure to comply with any of our functions means reduced performance. Each of which may carry consequences that affect safety, the environment, maintenance expenditure or a combination of these.

Error 2: Is availability a good measure of maintenance effectiveness?

A definition of effectiveness could be:

The ability of an item to do what is required of it.

One of the ways of measuring this is, as stated previously, the amount of time that an item is able to do what we want it to do, out of the time we require it to be done.
(Availability.)

Another measure of effectiveness is the amount of times that a piece of equipment is unable to do what is required of it. This is usually termed the failure rate or mean Time Between Failures. (MTBF) This is sometime referred to as the reliability of a piece of equipment. MTBF is usually calculated as the Total Time Required divided by the number of failures. There are other measures of maintenance effectiveness but we will concentrate here for the moment.

Example 2:

In example 1 we have already determined that our pump was available for 90% of the time that it was needed. This level that may or may not be reasonable. However the situation may arise where our 1-hour of downtime is caused by 20 distinct failures, each of which lasted 3 minutes. (3 x 20 = 1 hour) In this case our MTBF would be:

10 hours (Total Time)
20 failures (Number of failures)

Giving us a Mean Time Between Failures of 0.5 hours.

What is this telling us? This is an average measure only; as such it is not a measure of life of the equipment. And it is telling us that, on average over the period, we could expect the pump to run trouble free for a period of 30 minutes only.

Example 3:

Alternatively we may have a situation whereby we had the pump able to do what we required for 6 hours, an availability of 60%. But the 4 hours downtime was caused by only one failure. Therefore the MTBF is a reasonable, for the period measured, 10 hours. (Using the calculations above)

This is a pivotal point in understanding maintenance effectiveness and its measurement. In Example 2 we can see that the availability is reasonably high, giving an indication of relatively good performance. Yet the failure rate shows a piece of equipment performing very poorly. While in example 3 we can see a pump with a poor availability, showing it to be performing poorly, while the MTBF is rather high.

This phenomenon is continually misunderstood throughout the profession of maintenance management. Quite often I find myself on a site, possibly during an audit of the maintenance performance, where I find that the availability of the equipment is high. Yet the operations manager realises that they have a problem. They realise that even though the availability is high they are not producing what they believe they could, and also they seem to be having a high level of breakdowns. When the MTBF measure has been implemented it is often with surprising and not overly flattering results.

This tells us the following; although availability is a measure of maintenance effectiveness, it is only a small part of the whole picture. Any measure of equipment effectiveness needs to be based on both Availability and MTBF as a minimum. There are other measures, as alluded to previously, however we wont go into these as a part of this paper.

The Impact on O.E.E (Overall Equipment Effectiveness)

There are a great many issues around the measurement of O.E.E. Unfortunately, as has happened with several maintenance measures over the years, it has been blindly accepted without question by industry in general. Here we will deal only with the effects of availability as it is applied in this equation.

O.E.E is an attempt to measure, as is stated in its name, the overall or total effectiveness of a piece of equipment. This has been taken to mean a measure takes into account the availability, performance and the yield of a particular piece of equipment. It is usually expressed as:

Availability x Performance x Yield (Quality)

However after reviewing the examples and detail expressed previously, the amount of confidence we can have in availability as a part of this equation is greatly diminished. Firstly instead of measuring the overall effectiveness it is, at best, measuring the effectiveness of only a part of the primary function of a piece of equipment.

Also we now realise that without the inclusion of, at least, failure rate, or MTBF, is it inaccurate as a form of viewing effectiveness. So among the many other failings of O.E.E it can be clearly seen that it has dubious ability to truly measure effectiveness.

It can also now be seen that any other methods that depend greatly on availability as it core, such as availability modelling, also have dubious claims to be able to do anything more than measure the effects of the effectiveness of only a part of the primary function.

Conclusion

In conclusion availability is a good measure of effectiveness. However it is not the only measure and is not a very good indication when taken in isolation. Although there are various forms of measuring effectiveness, a combination of both availability and MTBF will provide a more accurate form of measurement.

A further, and even more accurate measure is that of Mean Time Between Fault (as opposed to failure). Basically measuring the average time between required maintenance interventions of the equipment. This then take into account smaller repairs and noted failures that, while still allowing the equipment to fulfil its original function, affect its ability to fully comply with all of the functions it may have. (As we saw with the failure of the secondary functions of our pump previously)

As a final note there is a need for maintenance professionals to fully understand the measure that they are using. This means understanding what they are indicating, what the effects of that are and most importantly, what they are not indicating.

Daryl Mather