Improve Asset Reliability & Efficiency at The Reliability Conference

The Reliability Conference 2025: Actionable Insights for Reliability Success.

Sign Up

Please use your business email address if applicable

Understanding the Rate of Change Dangers with Alarms

There is a plethora of oil analysis training courses in the market today. However, any course of reputable quality will discuss strategies around setting alarms for wear debris. The primary strategies for wear debris alarms include:

  • OEM recommended absolute values,
  • Statistically derived,
  • Rate of change.

Absolute Values

The beauty behind OEM recommended absolute values is that it gives a starting point on wear alarms. This is especially useful when an end user is just getting started on the road to oil analysis. However, the key phrase here is "starting point." It has been noted and proven in multiple studies that two like machines running the same process, under the same load, can, and most likely do, have different levels of wear over similar time periods. Here lies the problem with absolute wear alarms.

Let's look at a case involving a process critical gearbox at an industrial location that utilized OEM recommended absolute alarm values. This particular gearbox had historically been running 0ppm of iron during the regular oil samples. During one particular sample, the iron value came back at 6ppm. While in many instances, 6ppm in an industrial gearbox is considered to be just noise, but in this particular instance, there was a major cause for alarm. The analyst in this case reviewed the iron data, was able to cross-correlate that data with other oil sample data such as PQ index (Figure 1) and particle count data (Figure 2), all which showed significant increases, called for additional testing and warned of an impending failure.


Unfortunately, the customer in this instance relied on OEM data. The OEM indicated that this level of iron should not be considered a problem by any means. In fact, the OEM quoted an AGMA standard as supporting documentation for not alarming the iron until it reached above 50ppm. The AGMA guideline used was not for the type of gearbox discussed in this example, rather it was for a gearbox used in a completely different application. Five days after the initial warning and request for additional testing, the gearbox suffered a catastrophic failure resulting in the actual casing of the gearbox splitting in two.

Now we can understand why using OEM defined alarms should be considered just as a starting point.

Statistically Derived Alarms

The next alarm level, and one that is highly supported and recommended in many oil analysis training courses, is the statistically derived alarm. This term is loosely defined as simply utilizing population standard deviation to determine where the caution and critical point should be with respect to wear debris alarms.

In calculating the statistically derived alarm, one must take the average of the selected dataset then calculate the standard deviation. From this point, the end user can establish alarm points. The initial, or caution alarm, is generally set at one or two standard deviations above the average with a critical point being two or three standard deviations above average.

The choice should really be that of the end user, however, we are advocates of initially alarming wear debris at the average plus two standard deviations, with a critical value of the average plus three standard deviations. The approach allows for a very tight focus on the top 5 percent of problem machines. This becomes especially useful at the early stages of an oil analysis program to help reduce the natural occurrence of work order overload.

It is worth noting, however, that wear debris distributions do not generally fall into the normal distribution bell curve as seen in Figure 3. Most real wear distributions are closer to a log-normal or truncated log-normal curve. In these cases, two standard deviations do not necessarily equal the 95 percentile. Figure 4 shows a typical wear distribution curve.

Rate of Change Alarms

Rate of change alarms have long been considered the most precise method of setting alarms. The idea behind this method is to basically track the wear generation rate. The accepted thought on this is if we can monitor changes in the rate of wear, then we can make a better estimation of machine condition, as well as identify a potential condition very high up on the P-F curve. Generally, the goal is to normalize the data to a specific rate, such as wear per 100 hours of operation.

In order for this method to have a solid level of accuracy, the sample run time must be fairly consistent. This very basic method, while a decent starting point, can quite easily result in a false-positive situation, particularly when the run time on the oil is significantly lower than the normal sampling run time. If we were to use the simple data shown in Table 1, one could conclude that the sample showing a generation rate of 67ppm per 100 hours of operation would indicate a severe wear condition. This would likely result in some level of inspection when, in actuality, absolutely nothing could be wrong with the component.



Internal studies were done at Fluid Life, an oil analysis laboratory with facilities in the United States and Canada, to determine the impact of using different sampling run times when calculating the wear generation rate. The study was performed on a vast number of equipment makes and models. The results were the same regardless of the component type, the make, or the model.

It is often stated that as much as 10 percent to 30 percent of a sump volume can be left behind during an oil change. This is attributed to oil remaining behind on the moving components. By using simple math, one often assumes that this means that only 10 percent to 30 percent of residual debris would be left behind as well. That is not the case. According to a portion of the study, which included 3,772 samples, 49 percent of the iron was left behind after an oil change, on average. Looking at a completely different make and model of component with a total of 687 samples, the study showed 58 percent of wear debris remaining after an oil change, on average.

This tells us that we simply can't assume the wear metals start at 0ppm. We also cannot assume that the majority of wear debris is removed during an oil change. In fact, if we do continue to utilize the "as preached" way of calculating generation rate, we are setting up for failure. The Fluid Life study indicates that when comparing the "naïve" rate to the actual wear rate, there is a 273 percent increased chance of calling a component in a state of failure when all may very well be normal.

As we refer to Figure 5, the Actual Fe line is the average Fe in ppm for all samples in each oil hour "bucket." The oil hour buckets are grouped in 25 hour increments such that the "0" bucket includes all samples >= 0 hours up to and including those samples collected at 24 hours. The 25 hour bucket are those samples listed >=25 hours up to those samples collected at 49 hours, etc.

The naïve generation rate is the average Fe in each bucket divided by the bucket center value and then multiplied by 100 hours and is in ppm per 100 hours. (i.e., 0 bucket = 6.59/12.5*100).

The "corrected" is the average Fe in each bucket, with the estimated y intercept from the fitted line subtracted and then divided by the bucket start and multiplied by 100 hours.

Using this method allows one to calculate a better estimation of the generation rate using just the data for a single sample coupled with the intercept knowledge. As can be seen in Figure 5, the naïve generation rate will show substantially higher ppm/hr readings if the oil hours are lower than normal, particularly if they are less than 100 hours. Also, it counterintuitively informs us that the generation rate steadily decreases as the oil is left in for longer amounts of time.

In Conclusion

The proper setting of wear debris alarms can have a make or break effect on the overall effectiveness of an oil analysis program. While the goal of predictive maintenance is to identify a potential failure high up on the P-F curve, without a full understanding of alarms, it is likely that one could create a false identification of failure. Once that is done, a site will experience a similar effect as a missed opportunity and the credibility of the entire oil analysis program comes into question.

Jeff Keen is a Professional Computer Engineer. He serves as Vice President of Information Technology and Research and Development with Fluid Life. With 21 years of experience, Jeff designs and develops systems to manage and evaluate oil analysis results and related information.



Matt Spurlock is a Certified Lubrication Specialist and a Certified Maintenance & Reliability Professional. He serves as a Senior Reliability Specialist and Instructor with Fluid Life. With over 20 years in the field, Matt specializes in in-depth oil analysis data evaluation and lubrication program optimization for customers across all industries. www.fluidlife.com

Reliability.AITM

You can ask "R.A.I." anything about maintenance, reliability, and asset management.
Start
ChatGPT with
ReliabilityWeb:
Find Your Answers Fast
Start