Author’s Note: While putting the finishing touches on this article, I happened upon Bill Berneski’s Dec10/Jan11 Uptime article, “Deriving Task Periodicities within Reliability-Centered Maintenance.” Mr. Berneski’s article discussed mathematical formulae that could be used to compute intervals for scheduled restoration/discard, on-condition and failure finding tasks. The intent of this article is to strengthen the case for using a mathematical approach to derive the on-condition task interval for higher risk (or cost) failure modes. This mathematical method addresses the risk directly, by way of the reliability-centered maintenance (RCM) methodology, to assist organizations in meeting ISO310000 objectives for managing their assets.
The highly structured RCM process is a proven approach for determining what must be done to any physical asset to ensure it continues to do what you want it to do in the present operating context. The first step of the task selection process starts by assessing the effects of the failure mode and classifying them into one of four broad categories of consequences. The next step identifies a proactive task that reduces the consequences of failure to the extent that it is technically feasible. The criteria used to judge whether an on-condition maintenance task is technically feasible are fairly consistent across the various compliant RCM processes. Specifically, RCM2™ uses the following yardsticks:
- It is possible to define a clear, potential failure condition.
- The P-F interval is reasonably consistent.
- It is practical to monitor the item at intervals less than the P-F interval.
- The net P-F interval is long enough to be of some use.
On-condition task intervals are based on the expected P-F interval. (See Section 7.7 in the book,
Reliability-Centered Maintenance II,by John Moubrayif you're having difficulty in determining a P-F interval in the absence of empirical data. Mr. Moubray suggests a “rational approach” method for estimating P-F intervals on the basis of judgment and experience.) In order to detect the potential failure before it becomes a functional failure, the task frequency must be less than the P-F interval. Conventional RCM wisdom suggests that it is usually sufficient to select a task interval equal to half of the P-F interval.
So the questions are:
What might represent an “unusual” situation when the one half P-F interval guidance does not apply?
Should the task periodicity be a smaller fraction of the P-F interval? And if so, how much smaller?
In addressing these questions, one can conclude that adjustments to the task interval should be based on these considerations:
- The actual failure may present a risk to the organization because of safety, environmental, or high operational consequences.
- The specific on-condition task may not identify the onset of failure with a sufficiently high degree of confidence because of some uncertainty or inconsistency with the inspection method.
The Naval Air Systems Command has an RCM program for its in-service aircraft and support equipment. Its
Guidelines for the Naval Aviation Reliability-Centered Maintenance Process, NAVAIR 00-25-403 manual, states that on-condition task intervals for failure modes related to safety and the environment can be calculated using these two equations:
Equation (1) I = P-F/n
Where:
I = inspection interval
P-F = potential failure interval
n = number of inspections in the P-F interval
Assigning an acceptable probability of failure to detect a potential failure yields a second equation that can be used to determine n.
Equation (2) n = ln(Pacc)/ln(1-θ)
Where:
n = number of inspections in the P-F interval
θ = probability of detecting a potential failure with one occurrence of the proposed on-condition task, assuming the potential failure occurs
P
acc = acceptable probability of failure
The basis for using this method and deriving the equations are contained in NAVAIR 00-25-403, Appendix B. It would be a good idea to review the Appendix B, Section 1.2.1 explanation for the on-condition task interval determination methodology. SAE JA1011 states: “Any mathematical and statistical formulae that are used in the application of the process (especially those used to compute the intervals of any tasks) shall be logically supportable, and shall be available to and approved by the owner or user of the asset.” Being able to explain the math goes a long way in obtaining owner buy-in of the methodology.
Let’s examine the NAVAIR equations to better understand how the variables P
acc and θ affect the outcome of the on-condition task interval determination by using the following example. Assume you have a structure that is held together with 12 bolts that, by design, are not visible to the operator of the asset during normal operations. If any four of the 12 bolts become completely disassembled, it is thought the structure will collapse, with the possibility of operator death or serious injury. Because of the way the structure is used, the RCM review team decides the P-F interval is two years from the time the first bolt starts to loosen. The asset owner has stated that Pacc (the acceptable probability the structure may collapse) in any given year will be assigned a value of 0.00001. (Note: An annual probability of failure of 0.00001 suggests that this failure mode is not reasonably expected to ever occur over a 50 year period.) The proposed on-condition task involves visually assessing if any one of the 12 bolts is loosening. Because identification of the potential failure is dependent on the inspector's judgment, θ will be assigned a value of 0.90 by the RCM review team. The 0.90 value should be considered reasonable for a technique based on the human senses. When you plug the variables into Equations (1) and (2), the resulting inspection interval (I) is 0.4 years.
Because the values assigned to variables used in this analysis (P
acc, θ) might be considered somewhat subjective, the group is interested in considering alternate scenarios to see if a reasonable inspection interval can be bounded. With the equations built into a spreadsheet, alternate scenarios are easy to evaluate.
From the original scenario, increase the probability of detecting potential failure to 0.95 and the resulting inspection interval becomes approximately 0.5 years.
From the original scenario, change the acceptable probability of failure to 0.0001 and the resulting inspection interval also becomes approximately 0.5 years.
Note that the original assumptions result in an on-condition task interval of one fifth of the P-F interval, not one half. It’s also interesting to note that relaxing some of the assumptions, as was done with the alternate scenarios, didn’t change the resulting task interval appreciably. Hopefully, this example sufficiently illustrates the point that failure to consider the implications of the functional failure risk and the capability of the inspection in detecting the onset of failure could result in the review team settling on a 12 month task periodicity instead of the more technically defensible five to six months. Use of the equations rigorously account for the risk the failure mode presents to the organization and the effectiveness of the inspection method.
Figure 1 shows relationships between the probability of detecting a potential failure (θ, assuming the P-F condition exists) and the calculated number of inspections in the P-F interval (n), when varying θ and P
acc. The purpose of presenting this graph is to help explain some things that may not be instinctive.
Figure 1: Calculated number of inspections versus probability of detecting the P-F
Firstly, notice that when the failure mode consequences are relatively benign (Pacc =.1), the calculated n takes a long time to approach two, or one half the P-F interval. Secondly, when the Pacc is .001 or lower, θ becomes a bigger player in the task interval calculation than is probably intuitive to most of us. That’s because n increases exponentially as θ is incrementally decreased. This exponential relationship becomes much more apparent if you were to continue decreasing θ to 0.50. The graph was not expanded to include θ less than 0.75 because it’s hard to imagine that someone would specify an on-condition task that is expected to identify an existing P-F condition with less than 0.75 certainty. Thirdly, observe that none of the lines meet the Y-axis. That’s because the equation mathematically falls apart when θ is equal to one. As Mr. Berneski remarks in his Uptime article: “Therefore, we cannot calculate a 100 percent confidence in our inspection detecting P-F, which is in agreement with practical experience.”
Summary
To detect a potential failure before it becomes a functional failure, the task frequency must be less than the P-F interval. For low-risk failure modes, it is entirely appropriate to select a task interval equal to half of the P-F interval. However, there may be occasions when it is appropriate to select a task interval that is a smaller fraction of the P-F interval. For high stakes failure modes (e.g., safety, environmental and even high impact operations consequences), due consideration of the failure's risk to the organization and the degree of uncertainty in identifying the onset of failure is wise. In those situations, Equations (1) and (2) provide a simple and straightforward methodology using the P-F interval, an acceptable probability of the failure mode and likely probability of detecting the potential failure condition for a risk-based approach in determining defensible, on-condition task intervals.