Reliabilityweb Why Managers Don’t Endorse Reliability Initiatives

Why Managers Don’t Endorse Reliability Initiatives

In the DuPont benchmark study, the most reliable performance in the world was Total Productive Maintenance. It was also the least expensive to achieve and sustain. The assumption that high reliability is expensive comes from the fact that the vast majority of efforts to gain high reliability are misguided. The conventional wisdom is that maintenance best practices are planning, scheduling, preventative maintenance, optimized procurement, and predictive maintenance. While these practices in fact make maintenance much more efficient and effective, they do not address the most important aspect of reliability. Why is the equipment failing in the first place? Other best practices such as reliability centered maintenance are a step in the right direction but still do not address the largest root cause. In the ABC's of Failure (TMG News April, 2008), we concluded that approximately 84% of the defects that lead to failures are in fact created randomly by careless work practices throughout the entire organization.

For those who have not seen our earlier article on the ABC's of failure, we concluded that 4% of the defects are due to aging of equipment, 12% of the defects are due to basic wear and tear, which leaves 84% due to careless work processes. If one starts an initiative to improve reliability based on the conventional wisdom, he might expect to improve maintenance practices by implementing more preventive maintenance. However, by scheduling more frequent preventive tasks on equipment that we already do some amount of preventive maintenance or by expanding preventive maintenance to equipment that was not included in the preventive maintenance program before, we can only succeed in removing the defects that are created based on the passage of time. That only includes the aging and basic wear and tear defects, and they represent only 16% of the defects. People get very frustrated when they go beyond that 16% because it becomes apparent that the probability of adding a defect while over doing the preventive maintenance is higher than the probability of removing a defect. This also becomes very expensive and wasteful because work is being done to change parts that are in fact not defective. During 27 years of experience at DuPont, we went through about seven cycles of increasing preventive maintenance to the point of frustration and then abandoned most of the preventive work. Although important, preventive maintenance can't solve all of our problems, and we are wrong to expect it to be more than it is.

The other best practice that everyone recognizes as having merit is predictive maintenance. In this case, it is recognized that failures are not predictable with time alone but depend on how long it takes for a defect to propagate to a failure event. It also depends on how good the technology is in detecting that defect before it becomes a failure. Reliability programs for predictive maintenance concentrate on getting the requisite variety of detection technologies to find defects soon enough to allow for orderly planning and scheduling. The computer model of the DuPont benchmark facilities, found that the number of inspections required to ensure that >90% of the defects would be detected before the failure event occurred was so high that 97% of the time an inspection did not detect a defect at all. The difficulty with this approach is that it is demoralizing to sustain that kind of diligence over long periods of time except where the consequences of failure are catastrophic. Nuclear power plants are a great example of a facility that warrants this level of diligence and a good place to see how predictive maintenance can be very effective.

The problem in other facilities is that the cost of this kind of diligence is not competitive because the processes and equipment are much more complex than the simple process of boiling water to generate electricity. The experience with predictive maintenance at DuPont was similar to the experience with preventive maintenance. We started many predictive maintenance initiatives and succeeded until a routine operation was going, but then someone looked at the results of the inspections and decided that inspections were being overdone because we only found defects 3% of the time. This led to abandoning many predictive maintenance technologies. I once admired the fact that a mechanic in DuPont knew ten different technologies for predictive maintenance. I asked him how he learned so many technologies. He said that he had been doing predictive maintenance for fifteen years. I replied, "But this initiative is only one year old." He said, "Yes, but this is my ninth initiative." A few years later, we declared victory, dissolved the corporate maintenance leadership team that was leading the predictive maintenance initiative and completed the cycle once again.

So why did these initiatives consistently fail? The problem is not that they were pursuing the wrong best practices; it was that they failed to attack the larger problem of the randomness in the failure rates. When 84% of the failures are caused by random lack of discipline to operate, maintain, design, procure, and/or improve equipment, there is no efficient way to deal with the defects that get generated in these careless work habits. To cope with these defects, many companies try to solve the problem by adding spare equipment. This just adds to the amount of equipment that has to be maintained and therefore increases the expense to procure and maintain this extra equipment. One of the worst ways to do this is to keep a piece of equipment that has been replaced by a new one. We have seen many sites where the old piece of equipment is kept as a spare. In this case the maintenance cost is very high compared to maintaining the new piece of equipment, and it is simply there to use when the new piece is out for repairs. In DuPont the plant that had the best pump life, had zero spares. This decision caused them to treat the pumps like the precious assets they were.

As many of you have seen before, we use the stable domains to depict how reliability is generated by the behavior of the people. Below is the diagram in a simple form to illustrate another dimension to the picture.

Nature of Behavior

People generally agree with this way of looking at how operations and maintenance of a facility can be classified in one of these domains. Over the last fifteen years, we have endeavored to show why the successful sites have skipped going to the planned domain. That domain is inherently unstable due to the randomness of the defects that exist and the other factors mentioned above. Although the planned domain makes the work more efficient, it does nothing to reduce the amount of work that must be done. In order to see more clearly why this happens, it is better to look at these domains from a different perspective. This other dimension is the amount of activity and therefore cost that is required to attain and sustain each domain. The figure below shows that view.

Activities and Costs

This diagram is representative of the improvement realized at the Lima refinery. The number of work orders was reduced by 67% over an eight year period. This transformation, however, did not go through the planned domain to get to the precision domain. As the diagram shows, the extra work required to do this in the planned domain would make it much more probable that they would have returned to the reactive domain than progressed to the precision domain. If they had undertaken the extra work to get to the planned domain, it would be logical to assume that another increase in cost and work would be needed to get to the precision domain. Fortunately for us, we had seen the data from the DuPont benchmark plants in Japan that had won the Total Productive Maintenance awards. In these plants they showed us that the amount of work, and therefore the cost of maintaining a highly reliable plant, was in fact even less than the work and cost of remaining in the reactive domain. In the 3D view these points have been combined to show that the precision domain has both the highest uptime and the lowest cost.

Activities and Costs

For equipment uptime an even better view is to look at this diagram as the 3D bar chart below.

Uptime

Article submitted by Winston Ledet, Co-Author, Don't Just Fix It, Improve It! A Journey to the Precision Domain