Reliabilityweb The Case Against Streamlined Reliability Centered Maintanance

The Case Against Streamlined Reliability Centered Maintanance

In 1978, a report¹ was prepared for the US Department of Defense describing the then current state of the process. The report was written by F Stanley Nowlan and Howard Heap of United Airlines. It was entitled "Reliability-centered Maintenance", or RCM. It formed the basis of the maintenance strategy formulation process called MSG3². MSG3 was first promulgated in 1980, and in slightly modified form, it is used to this day by the international commercial aviation industry. In the early 1980's, RCM as described by Nowlan and Heap also began to be used in industries other than aviation.

It soon became apparent that no other comparable technique exists for identifying the true, safe mini-mum of what must be done to preserve the functions of physical assets. As a result, RCM has now been used by thousands of organizations spanning nearly every major field of organized human endeavour. It is becoming as fundamental to the practice of physical asset management as double-entry bookkeeping is to financial asset management.

The growing popularity of RCM has led to the development of numerous derivatives. As we see later in this paper, some of these derivatives are refinements and enhancements of Nowlan and Heap's original RCM process. However, less rigorous derivatives have also emerged, most of which are attempts to ‘streamline' the maintenance strategy formulation process. This paper reviews some of the most common forms of streamlining. It concludes by suggesting that from the viewpoints of risk and the defensibility of the output, there is simply no place for shortcuts in the formulation of maintenance strategies.

In order to place a review of these techniques in context, the next two sections of this paper consider the recently-published SAE RCM Standard³ and recent developments in the regulatory world.

2 The SAE RCM Standard

As mentioned above, various derivatives of Nowlan and Heap's RCM process have emerged since their report was published in 1978. Many of these derivatives retain the key elements of the original process. However the widespread use of the term "RCM" led to the emergence of a number of processes that differ significantly from the original, but that their proponents also call "RCM". Many of these other processes either omit key steps of the process described by Nowlan and Heap, or change their sequence, or both. Consequently, despite claims to the contrary made by the proponents of these processes, the output differs markedly from what would be obtained by conducting a full, rigorous RCM analysis.

A growing awareness of these differences led to an increasing demand for a standard that set out the criteria any process must comply with in order to be called "RCM". Such a standard was published by the Society of Automotive ^Engineers (SAE) in 1999. An article⁴ by Dana Netherton, Chairman of the SAE RCM Committee, described the evolution of RCM between 1978 and 1990, and then went on to describe the evolution of the SAE Standard as quoted in the italicised paragraphs below:

The Need for a Standard: the 1990s

Since the early 1990's, a great many more organisations have developed variations of the RCM process. Some, such as the US Naval Air Command with its ‘Guidelines for the Naval Aviation Reliability Centered Maintenance Process (NAVAIR 00-25-403)⁵' and the British Royal Navy with its RCM-oriented Naval Engineering Standard (NES45)⁶, have remained true to the process originally expounded by Nowlan and Heap. However, as the RCM bandwagon has started rolling, a whole new collection of processes has emerged that are called "RCM" by their proponents, but that often bear little or no resemblance to the original meticulously researched, highly structured and thoroughly proven process developed by Nowlan and Heap. As a result, if an organisation said that it wanted help in using or learning how to use RCM, it could not be sure what process would be offered.

Indeed, when the US Navy recently asked for equipment vendors to use RCM when building a new ship class, one US company offered a process closely related to the 1970 MSG-2 process. It defended its offering by noting that its process used a decision-logic diagram. Since RCM also uses a decision-logic diagram, the company argued, its process was an RCM process.

The US Navy had no answer to this argument, because in 1994 William Perry, the US Secretary of Defense, had established a new policy about US military standards and specifications, which said that the US military would no longer require industrial vendors to use the military's ‘standard' or ‘specific' processes. Instead it would set performance requirements, and would allow vendors to use any processes that would provide equipment that would meet these requirements.

At a stroke, this voided the US military standards and specifications that defined "RCM". The US Air Force standard was cancelled in 1995. The US Navy has been unable to invoke its standards and specifications with equipment vendors (though it continues to use them for its internal work) - and it was unable to invoke them with the US company that wished to use MSG-2.

This development happened to coincide with the sudden interest in RCM in the industrial world. During the 1990s, magazines and conferences devoted to equipment maintenance have multiplied, and magazine articles and conference papers about RCM became more and more numerous. These have shown that very different processes are being given the same name, "RCM". So both the US military and commercial industry saw a need to define what an RCM process is.

In his 1994 memorandum, Perry said, "I encourage the Under Secretary of Defense (Acquisition and Technology) to form partnerships with industry associations to develop non-government standards for replacement of military standards where practicable." Indeed, the Technical Standards Board of the SAE has had a long and close relationship with the standards community in the US military, and has been working for several years to help develop commercial standards to replace military standards and specifications, when needed and when none already existed.

So in 1996 the SAE began working on an RCM-related standard, when it invited a group of representatives from the US Navy aviation and ship RCM communities to help it develop a standard for Scheduled Maintenance Programs. These US Navy representatives had already been meeting for about a year in an effort to develop a US Navy RCM process that might be common between the aviation and ship communities, so they had already done a considerable amount of work when they began to meet under SAE sponsorship. In late 1997, having gained members from commercial industry, the group realised that it was better to focus entirely on RCM. In 1998, the group found the best approach for its standard, and in 1999 it completed its draft of the standard, and the SAE approved it and published it.

After a brief discussion about the practical difficulties associated with attempting to develop a universal standard of this nature, Netherton went on to say:

The standard now approved by the SAE does not present a standard process. Its title is, "Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes (SAE JA1011)." This standard presents criteria against which a process may be compared. If the process meets the criteria, it may confidently be called an "RCM process." If it does not, it should not. (This does not necessarily mean that processes that do not comply with the SAE RCM standard are not valid processes for maintenance strategy formulation. It simply means that the term "RCM" should not be applied to them.)

The italicised paragraph below quotes Section 5 of the Standard, which summarises the key attributes of any RCM process as follows.

Reliability-Centered Maintenance (RCM)-Any RCM process shall ensure that all of the following seven questions are answered satisfactorily and are answered in the sequence shown below:

a. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?

b. In what ways can it fail to fulfil its functions (functional failures)?

c. What causes each functional failure (failure modes)?

d. What happens when each failure occurs (failure effects)?

e. In what way does each failure matter (failure consequences)?

f. What should be done to predict or prevent each failure (proactive tasks and task intervals)?

g. What should be done if a suitable proactive task cannot be found (default actions)?

To answer each of the above questions "satisfactorily", the following information shall be gathered, and the following decisions shall be made. All information and decisions shall be documented in a way which makes the information and the decisions fully available to and acceptable to the owner or user of the asset.

Subsequent sections of the Standard list the issues that any true RCM process must address in order to answer each of the seven questions "satisfactorily". However, the key words in Section 5 of the Standard are in the first sentence. They are: ‘any', ‘all' and ‘in the sequence shown below'. They mean that if any process that does not answer all the questions in the sequence shown (and which does not answer them satisfactorily in compliance with the rest of the standard), then that process is not RCM.

None of the streamlined "RCM" processes comply fully with the requirements of section 5 of the SAE Standard. The implications of this point are discussed in more detail later.

3 Regulatory Issues

The reaction of society as a whole to equipment failures is changing at warp speed as we move into the 21st century. However, it has attracted surprisingly little comment within the physical asset management community so it is worth reviewing some of the highlights.
The changes began with sweeping legislation governing industrial safety, mainly in the 1970's. Among the best known examples of such legislation are the Occupational Safety and Health Act of 1970 in the United States and the Health and Safety at Work Act of 1974 in the United Kingdom. These Acts are fairly general in nature, and similar laws have been passed in nearly all the major industrialised countries. Their intent is to ensure that employers provide a generally safe working environment for their employees.

These Acts were followed by a series of more specific safety-oriented laws and regulations such as OSHA Regulation Nº 1910.119: "Process Safety Management of Highly Hazardous Chemicals" in the United States and the "Control of Substances Hazardous to Health Regulations" in the United Kingdom. Both of these regulations were first promulgated in the early to mid-1990's. They are noteworthy examples of a then-new requirement for the users of hazardous materials to perform formal analyses or assessments of the associated systems, and to document the analyses for subsequent inspection if necessary by regulators.

These two sets of developments represent a steady increase in legal requirements to exercise - and to be able to show that we are exercising - responsible custodianship of the assets under our control. They have placed a significant burden on the managers of the assets concerned. However, they reflect the steadily rising expectations of society in terms of industrial safety and we have no choice but to comply as best we can.

It would be nice if it all ended there, but unfortunately this tide has not stopped rising. The late 1990's have seen even more changes, this time concerning the sanctions that society now wishes to impose if things go wrong. Until the mid-90's, if a failure occurred whose consequences were serious enough to warrant criminal proceedings, these proceedings usually ended at worst with a substantial fine imposed on the organisation found to be at fault, and the matter - at least from the criminal point of view - usually ended there. (Occasionally, the organisation's permit to operate was withdrawn, as in the case of the ValuJet airline after the crash in Florida on 11 May 1996. This effectively put the airline out of business in its then-current form.)

However, following recent disasters, a movement is now developing not only to punish the organisations concerned, but also to impose criminal sanctions on individual managers. In other words, under certain circum-stances, individual managers can be sent to prison in connection with equipment failures that have sufficiently nasty consequences.

Following the Paddington rail crash⁷ that occurred in 1999, the Law Commission of the United Kingdom has proposed that the laws governing involuntary manslaughter be revised to cover three new categories of crime, one of which is to be called ‘corporate killing'⁸. Depending on the circumstances, any one of these categories may be invoked in the event of an industrial accident that results in the death of a person. Penal-ties range from permanent disqualification from acting in a management role in any undertaking carrying on a business or activity in the UK, to life imprisonment.

In the United States, following the outcry about the accidents involving tire tread separation on SUV's, section 30170 of the "Motor Vehicle and Motor Vehicle Defect Notification Act" was revised in October 2000 to include prison sentences of up to 15 years for "directors, officers or agents" of vehicle manufacturers who commit specified offences in connection with vehicles that fail in a way that causes death or bodily injury.

There is considerable controversy about the reasonableness of these initiatives, and even some doubt about their ultimate enforceability. However, from the point of view of people involved in the management of physical assets, the issue is not what is reasonable, but that we are increasingly being held personally accountable for actions that we take on behalf of our employers. Not only that, but if we are called to account in the event of a serious incident, it will be in circumstances that could culminate in jail sentences.

Perhaps the most startling legislative developments of all were triggered by an industrial accident that occurred in Australia. Following the Longford disaster⁹ in September 1998 in the state of Victoria, the Victorian State Parliament on 13 November 1998 added a new section to the State of Victoria Evidence Act of 1958 which reads as follows:

"19D. Legal professional privilege

(1) Despite anything to the contrary in this Division, if a person is required by a commission to answer a question or produce a document or thing, the person is not excused from complying with the requirement on the ground that the answer to the question would disclose, or the document contains, or the thing discloses, matter in respect of which the person could claim legal professional privilege.

(2) The commissioner may require the person to comply with the requirement at a hearing of the commis-sion from which the public, or specified persons, are excluded in accordance with section 19B."

In essence, this amendment suspended attorney/client confidentiality for the purposes of the Longford - and subsequent - official inquiries.
Not only this, but state governments of Victoria and Queensland are considering legislation to deal with "Industrial Manslaughter (Vic)" and "Corporate Culpability (Qld)" as both governments believe that their current legislation does not deal adequately with industrial incidents causing death or serious injury. Victoria is leading the way after the Longford incident. These proposed laws go further than the laws in the UK and the USA, in that the concept of "aggregation of negligence" is introduced. This allows the aggregation of actions and omissions of a group of employees and managers to establish that an organisation is negligent. Both governments have made it clear that if managers and/or a management system fails to prevent workplace death or serious injury, then the responsible manager and/or management team is likely to face criminal prosecution. If the legislation proceeds, penalties of over $500,000 and 7 years imprisonment are proposed.
The message to us all is that society is getting so sick of industrial accidents with serious consequences that not only is it seeking to call individuals as well as corporations to account, but that it is prepared to alter well-established principles of jurisprudence to do so. Under these circumstances, everyone involved in the management of physical assets needs to take greater care than ever to ensure that every step they take in executing their official duties is beyond reproach. It is becoming professionally suicidal to do otherwise.

4 Streamlined RCM

The author and his associates have helped companies to apply true RCM on more than 1 500 sites spanning 44 countries and nearly every form of organised human endeavour. We have found that when true RCM has been correctly applied by well-trained individuals working on clearly defined and properly managed pro-jects, the analyses have usually paid for themselves in between two weeks and two months. This is a very rapid payback indeed.

However, despite this rapid payback, some individuals and organisations have expended a great deal of energy on attempts to reduce the time and resources needed to apply the RCM process. The results of these attempts are generally known as ‘streamlined' RCM techniques.
This section of this paper outlines the main features of some of the most widely touted ‘streamlined' approaches to RCM. In all cases, the proponents of these techniques claim that their principal advantage is that they achieve similar results to something which they call ‘classical' RCM, but that they do so in much less time and at much lower cost. However, not only is this claim questionable, but all of the streamlined techniques have other drawbacks, some quite serious. These drawbacks are also highlighted in the following paragraphs.

4.1 Retroactive RCM approaches

The most popular method of ‘streamlining' the RCM process starts not by defining the functions of the asset (as specified in the SAE Standard), but starts with the existing maintenance tasks. Users of this approach try to identify the failure mode that each task is supposed to be preventing, and then work forward again through the last three steps of the RCM decision process to re-examine the consequences of each failure and (hope-fully) to identify a more cost-effective failure management policy. This approach is what is most often meant when the term ‘streamlined RCM'¹⁰ is used. It is also known as "backfit" RCM¹¹ or "RCM in reverse".

Retroactive approaches are superficially very appealing, so much so that the author tried them himself on numerous occasions when he was new to RCM. However, in reality they are also among the most dangerous of the streamlined methodologies, for the following reasons:

- they assume that existing maintenance programs cover just about all the failure modes that are reasonably likely to require some sort of preventive maintenance. In the case of every maintenance program that I have encountered to date, this assumption is simply not valid. If RCM is applied correctly, it transpires that nowhere near all of the failure modes that actually require PM are covered by existing maintenance tasks. As a result, a considerable number of tasks have to be added. Most of the tasks that are added apply to protective devices, as discussed below. (Other tasks are eliminated because they are found to be unnecessary, or the type of task is changed, or the frequency is changed. The nett effect is usually a reduction in perceived overall PM workloads, typically by between 40% and 70%.)

- when applying retroactive RCM, it is often very difficult to identify exactly what failure cause motivated the selection of a particular task, so much so that either inordinate amounts of time are wasted trying to establish the real connection, or sweeping assumptions are made that very often prove to be wrong. These two problems alone make this approach an extremely shaky foundation upon which to build a maintenance program.

- in re-assessing the consequences of each failure mode, it is still necessary to ask whether "the loss of function caused by the failure mode will become evident to the operating crew under normal circum-stances". This question can only be answered by establishing what function is actually lost when the fail-ure occurs. This in turn means that the people doing the analysis have to start identifying functions anyway, but they are now trying to do so on an ad hoc basis halfway through the analysis (and they are not usually trained in how to identify functions correctly in the first place because this approach considers the function identification step to be unnecessary). If they do not, they start making even more sweeping - and hence often incorrect - assumptions that add to the shakiness of the results.

- retroactive approaches are especially weak on specifying appropriate maintenance for protective devices. As stated by the author in his book¹²: "at the time of writing, many existing maintenance programs provide for fewer than one third of protective devices to receive any attention at all (and then usually at inappropriate intervals). The people who operate and maintain the plant covered by these programs are aware that another third of these devices exist but pay them no attention, while it is not unusual to find that no-one even knows that the final third exist. This lack of awareness and attention means that most of the protective devices in industry - our last line of protection when things go wrong - are maintained poorly or not at all." So if one uses a retroactive approach to RCM, in most cases a great many protective devices will continue to receive no attention in the future because no tasks were specified for them in the past. Given the enormity of the risks associated with unmaintained protective devices, this weakness of retroactive RCM alone makes it completely indefensible. (Some variants of this approach try to get around this problem by specifying that protective systems should be analysed separately, often outside the RCM framework. This gives rise to the absurd situation that two analytical processes have to be applied in order to compensate for the deficiencies created by attempts to streamline one of them.)

- more so than any of the other streamlined versions of RCM, retroactive approaches focus on maintenance workload reduction rather than plant performance improvement (which is the primary goal of function-oriented true RCM). Since the returns generated by using RCM purely as a tool to reduce maintenance costs are usually lower - sometimes one or two orders of magnitude lower - than the returns generated by using it to improve reliability, the use of the ostensibly cheaper retroactive approach becomes self defeating on economic grounds, in that it virtually guarantees much lower returns than true RCM.

4.2 Use of generic RCM analyses

A fairly widely-used shortcut in the application of RCM entails applying an analysis performed on one system to technically identical systems. In fact, one or two organizations even sell such generic analyses, on the grounds that it is cheaper to buy an analysis that has already been performed by someone else than it is to perform your own. The following paragraphs explain why generic analyses should be treated with great caution:

• operating context: In reality, technically identical systems often require completely different maintenance pro-grams if the operating context is different. For example, consider three pumps A, B and C that are technically identical (same make, model, drives, pipework, valvegear, switchgear, and pumping the same liquid against the same head). The generic mindset suggests that a maintenance program developed for one pump should apply to the other two.

However, Pump A stands alone, so if it fails, operations will be affected sooner or later. As a result the users and/or maintainers of Pump A are likely to make some effort to anticipate or prevent its failure. (How hard they try will be governed both by the effect on operations and by the severity and frequency of the failures of the pump.)

However, if pump B fails, the operators simply switch to pump C, so the only consequence of the failure of pump B is that it must be repaired. As a result, it is likely that the operators of B would at least consider letting it run to failure (especially if the failure of B does not cause significant secondary damage.) On the other hand, if pump C fails while pump B is still working (for instance if someone cannibalizes a part from C), it is likely that the operators will not even know that C has failed unless or until B also fails. To guard against this possibility, a sensible maintenance strategy might be to run C from time to time to find out whether it has failed. This example shows how three identical assets can have three totally different maintenance policies because the operating context is different in each case. In the case of the pumps, a generic program would only have specified one policy for all three pumps.

Apart from redundancy, many other factors affect the operating context and hence affect the maintenance programs that could be applied to technically identical assets. These include whether the asset is part of a peak load or base load operation, cyclic fluctuations in market demand and/or raw material supplies, the availability of spares, quality and other performance standards that apply to the asset, the skills of the operators and maintainers, and so on.

• maintenance tasks: different organizations - or even different parts of the same organization - seldom employ people with identical skillsets. This means that people working on one asset may prefer to use one type of proactive technology (say high-tech condition monitoring), while another group working on an identical asset may be more comfortable using another (say a combination of performance monitoring and the human senses). It is surprising how often this difference does not matter, as long as the techniques chosen are cost-effective. In fact, many maintenance organizations are starting to realize that there is often more to be gained from ensuring that the people doing the work are comfortable with what they are doing than it is to compel everyone to do the same thing. (The validity of different tasks is also affected by the operating context of each asset. For instance, think how background noise levels affect checks for noise.) Because generic analyses necessarily incorporate a "one size fits all" approach to maintenance tasks, they do not cater for these differences and hence have a significantly reduced chance of acceptance by the people who have to do the tasks.

These two points mean that special care must be taken to ensure that the operating context, functions and desired standards of performance, failure modes, failure consequences and the skills of the operators and main-tainers are all effectively identical before applying a maintenance policy designed for one asset to another. They also mean that an RCM analysis performed on one system should never be applied to another without any further thought just because the two systems happen to be technically identical.

4.3 Use of generic lists of failure modes

‘Generic' lists of failure modes are lists of failure modes - or sometimes entire FMEA's - prepared by third parties. They may cover entire systems, but more often cover individual assets or even single components. These generic lists are touted as another method of speeding up or ‘streamlining' this part of the maintenance program development process. In fact, they should also be approached with great caution, for all the reasons discussed in the previous section of this paper, and for the following additional reasons:

• the level of analysis may be inappropriate: It is possible to ‘drill down' almost any number of levels when seeking to identify failure modes (or causes of failure). The point at which this process should stop is the level at which it is possible to identify an appropriate failure management policy, and this can vary enormously depending once again on the operating context of the system. In other words, when establishing causes of failure for technically identical assets, it may be appropriate in one context to ask ‘why' it fails once, and in another it may be necessary to ask ‘why' seven or eight times. However, if a generic list is used, this decision will already have been made in advance of the RCM analysis. For instance, all the failure modes in the generic list may have been identified as a result of asking ‘why' four or five times, when all that may be needed is level 1. This means that far from streamlining the process, the generic list would condemn the user to analysing far more failure modes than necessary. Conversely, the generic list may focus on level 3 or 4 in a situation where some of the failure modes really ought to be analysed at level 5 or 6. This would result in an analysis that is too superficial and possibly dangerous

• the operating context may be different: The operating context of your asset may have features which make it susceptible to failure modes that do not appear in the generic list. Conversely, some of the modes in the generic list might be extremely improbable (if not impossible) in your context.

• performance standards may differ: your asset may operate to standards of performance which mean that your whole definition of failure may be completely different from that used to develop the generic FMEA.

These three points mean that if a generic list of failure modes is used at all, it should only ever be used to supplement a context-specific FMEA, and never used on its own as a definitive list.

4.4 Skipping elements of the RCM process

Another common way in which the RCM process is "streamlined" is by skipping various elements of the process altogether. The step most often omitted is the definition of functions. Proponents of this methodology start immediately by listing the failure modes that might affect each asset, rather than by defining the functions of the asset under consideration. They do so either because they claim that, especially in the case of "non-safety-critical" plant, identifying functions does not contribute enough relative to the amount of time it takes¹³, or because they simply appear not to be aware that defining all the functions and the associated desired standards of performance of the assets under review is an integral part of the RCM process¹⁴.

In fact, it is generally accepted by all the proponents of true RCM that in terms of improved plant performance, by far the greatest benefits of true RCM flow from the extent to which the function definition step transforms general levels of understanding of how the equipment is supposed to work. So cutting out this step costs far more in terms of benefits foregone than it saves in reduced analysis time.

From a purely technical point of view, the identification of functions and associated desired of perform-ance also makes it far easier to identify the surprisingly common situations (failure modes) where the asset is simply incapable of doing what the user wants it to do, and therefore fails too soon or too often. For this reason, eliminating the function definition step further reduces the power of the process.

The comments in the second sub-paragraph in section 4.1 above also apply here.

4.5 Analyse only "critical" functions or "critical" failures

The SAE Standard stipulates inter alia that a true RCM analysis should define all functions, and that all reasonably likely failure modes should be subjected to the formal consequence evaluation and task selection steps. The shortcuts embodied in some of the streamlined RCM processes try to analyse ‘critical' functions only, or to subject only ‘critical' failure modes to detailed analysis. These approaches have two main flaws, as follows:

- the process of dismissing functions and/or failure modes as being ‘non-critical' necessarily entails making assumptions about what a more detailed analysis might reveal. In the personal experience of the author, such assumptions are frequently wrong. It is surprising how often apparently innocuous functions or failure modes are found on closer examination to embody elements that are highly critical in terms of safety and/or environmental integrity. As a result, the practice of prematurely dismissing functions or failure modes results in much riskier analyses, but because the analysis is incomplete, no-one knows where or what these risks are

- many of the streamlined processes that adopt this approach incorporate elaborate additional steps designed to ‘help' identify what functions and/or failure modes are critical or non-critical. In a great many cases, applying these additional steps takes longer and costs more than it would take to conduct a rigorous analysis of every function and every reasonably likely failure mode using true RCM, yet the output is considerably less robust.

4.6 Analyse only "critical" equipment

An approach to maintenance strategy formulation that is often presented as a ‘streamlined' form of RCM suggests that the RCM process should be applied to ‘critical' equipment only. This issue does not fall within the ambit of the SAE Standard, because the Standard does not deal with the selection of equipment for analysis. It defines RCM as a process that can be applied to any asset, and it assumes that decisions about what equipment is to be analysed and about system boundaries have already been made when the time comes to apply the RCM process defined in the Standard. There were two reasons why the equipment selection process was omitted from the Standard:

- different industries use widely differing criteria to judge what is ‘critical'. For instance, the ability of assets to produce products within given quality limits is a major issue in manufacturing operations, and hence features prominently in assessments of criticality. However, this issue barely figures at all with respect to equipment used by military undertakings. This means that there is an equally wide range of techniques used to assess criticality - so wide that it is impossible to encompass this issue in one universal standard.

- there is a growing school of thought (with which the author of this paper has some sympathy) that there is no such thing as an item of plant - at least in an industrial context - that is ‘non-critical' or ‘non-significant' to the extent that it does not justify analysis using RCM. Two of the main reasons for believing that systems or items of plant should not be dismissed as ‘non-critical' prior to rigorous analysis are exactly the same as the reasons given in section 4.5 above for not dismissing functions and failure modes in the same way. (In fact, many organisations that choose to start with a formal, across-the-board equipment criticality assessment seem to spend as much time deciding what assessment methodology they will use and then applying it as they would have spent using true RCM to analyse all the equipment in their facility.)

Much more could be said both in favour of and against the idea of using equipment criticality assessments as a means of deciding whether to perform rigorous analyses using techniques such as RCM. However, since criticality assessment techniques are not an integral part of the RCM process, such a discussion is beyond the scope of this paper. Suffice it to say that it is incorrect to present such techniques as streamlined forms of RCM because they do not form part of the RCM process as defined by the SAE Standard.

5 Conclusion

In nearly all cases, the proponents of the streamlined approaches to RCM outlined in Section 4 claim that these approaches can produce much the same results as true RCM in about a half to a third of the time. However, the above discussion indicates that not only do they not produce the same results as true RCM, but that they contain logical or procedural flaws which increase risk to an extent that overwhelms any small advantage they might offer in reduced application costs. It also transpires that many of these ‘streamlined' techniques actually take longer and cost more to apply than true RCM, so even this small advantage is lost. As a result, the business case for applying streamlined RCM is suspect at best.

However, a rather more serious point needs to be borne in mind when considering these techniques. The very word ‘streamline' suggests that something is being omitted, and Section 4 of this paper indicates that this is indeed so for the streamlined techniques described. In other words, there is to a greater or lesser extent a degree of sub-optimisation embodied in all of these techniques.

Leaving things out inevitably increases risk. More specifically, it increases the probability that an unanti-cipated failure, possibly one with very serious consequences, could occur. If this does happen, as suggested in Section 3, managers of the organisation involved are increasingly likely to find themselves called person-ally to account. If the worst comes to the worst, they will not only have to explain, often in an emotionally-charged courtroom confronted by bitterly hostile legal Rottweilers, what went wrong and why. They will also have to explain why they deliberately chose a sub-optimal decision-making process to establish their asset management strategies in the first place, rather than using one which complies fully with a Standard set by an internationally-recognised standards-setting organisation. It would not be me that they would have to convince, not their peers and not their managers, but a judge and jury.

One rationale often advanced for using the streamlined methods is that it is better to do something than to do nothing. However, this rationale misses the point that all the analytical processes described above, streamlined or otherwise, require their users to document the analyses. This means that a clear audit trail exists showing all the key information and decisions underlying the asset management strategy, in most cases where none has existed before. If a sub-optimal approach is used to formulate these strategies, the existence of written records makes every shortcut much clearer to any investigators than they would otherwise have been. (This in turn may suggest that perhaps we should simply forget about all of these formal analytical processes. Unfortunately, the demand for documented analyses embodied in the second wave of safety legislation described in Section 3 of this paper does not allow us this option.)

A further rationale for streamlining says something like "we have been using this approach for a few years now and we haven't had any accidents, so it must be all right." This rationale betrays a complete misunderstanding of the basic principles of risk. Specifically, no analytical methodology can completely eliminate risk. However, the difference between using a more rigorous methodology and a less rigorous methodology may be the difference between a probability of a catastrophic event of one in a million versus one in ten thousand. In both cases, the event may happen next year or it may not happen for thousands of years, but in the second case, it is a hundred times more likely. If such an event were to happen, the user of true RCM would be able to claim that he or she exercised prudent, responsible custodianship by applying a rigorous process that complies with an internationally recognised standard, and as such would be in a highly defensible position. Under the same circumstances, the user of streamlined RCM is on much, much shakier ground.

6 Footnote

An interesting footnote to the whole debate about streamlined RCM concerns what exactly it is that is ostensibly being streamlined. Nearly all the advocates of streamlined processes compare their offerings to something they call ‘classical' RCM. However, closer study of what they mean by ‘classical' RCM reveals that it is often a monstrously complicated process or collection of processes that bears little or no resemblance to RCM as defined in the SAE standard. In these cases, it is hardly surprising that streamlined RCM is cheaper and quicker than these so-called ‘classical' fantasies. In reality, if true RCM is applied as explained in the first paragraph of section 4 of this paper, it is nearly always quicker and cheaper than the streamlined versions, in addition to being far more defensible and producing far greater returns.

References:

1 Nowlan FS and Heap H: "Reliability-centered Maintenance". Springfield, Virginia. National Technical Information Service, United States Department of Commerce

2 Maintenance Steering Group - 3 Task Force: "Maintenance Program Development Document MSG-3". Washington DC: Air Transport Association (ATA) of America. 1993

3 International Society of Automotive Engineers: "JA1011 - Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes". Warrendale, Pennsylvania, USA: SAE Publications

4 Netherton D: "SAE's New Standard for RCM". Maintenance (UK) 15 (1) 3 - 7, 2000

5 US Naval Air Systems Command: "NAVAIR 00-25-403: Guidelines for the Naval Aviation Reliability Centered Maintenance Process". Philadelphia, Pennsylvania. US Department of Defense Publications

6 RCM Implementation Team, Royal Navy: "NES 45 Naval Engineering Standard 45, Requirements for the Application of Reliability-Centred Maintenance Techniques to HM Ships, Royal Fleet Auxiliaries and other Naval Auxiliary Ves-sels". Foxhill, Bath, United Kingdom. UK Ministry of Defence Publications

7 UK Health & Safety Executive: "Train Accident at Ladbroke Grove Junction, 5 October 1999": Third HSE Interim Report". www.hse.gov.uk/railway/paddrail/interim3/htm

8 Bartram P: "What Price a Life?" Financial Director (UK), 2 August 2000

9 Various: "The Longford Royal Commission": www.theage.com.au/special/gas/index.html

10 Bookless C & Sharkey M: "Streamlined RCM in the Nuclear Industry". Maintenance (UK) 14 (1) 27 - 30, 2000

11 Jacobs KS: "Reducing Maintenance Workload Through Reliability-Centered Maintenance Processes": ASNE Fleet Maintenance Symposium. October 1997. San Diego, California

12 Moubray JM: "Reliability-centered Maintenance": New York, New York USA: Industrial Press

13 Dixey M & Gallimore J: "Fast Track RCM - Getting Results from RCM". Maintenance (UK) 15 (1) 2000 8 - 11

14 Mundy S D: "Completing the Reliability Centered Maintenance Loop at a New Process Facility". Reliability (USA) 7 (3) 30 - 33, 2000

From Your Site Articles

Reliabilityweb RCM Standards: Useful Tool or Marketing Ploy? ›