Case: Weibull-based Method for Failure Mode Characterization & Remaining Life Expectancy Estimat

30 July 2012

Figure 1 - Process Equipment System

Chronologically, this heat exchanger was opened two times for cleaning from the first evaluated functional degradation in a total period of 16 months. So, considering the first degradation period, there were three other ones in the whole term. This is shown in Figure 2.

Figure 2 - Historical trend of the internal bypass valve position (electronic limit at 12%)

2. Objectives

2.1. To characterize the failure mode according to the internal bypass valve behavior.

2.2. To estimate the remaining life expectancy due to the failure mechanism progress according to another related process variable behavior.

3. Development

3.1. Objective 2.1.

3.1.1. The first step, according to the proposed approach, was to collect the historical real time positioning of the internal bypass valve (discrete values each hour) at three different periods. (see Figure 3 example):

Figure 3 - Example of discrete values for the internal bypass valve position

3.1.2. The second one was to build histograms for the three sets of historical data, valve position based. The idea behind this was that each time the bypass valve adopted a new position, supposed surrounding process "stable" conditions was due to the need of compensating a heat exchanging characteristic variation, an issue associated to a failure on a Weibull approach (no new position, good and stable condition, no failure).

The amount of times the bypass valve kept a single position (that means amount of hours) was considered as representing the survival period for this failure mode occurrence (this sample).

Every other non-adopted positions within the 0% to 100% range for the considered period were taken as suspensions.

On heat exchanging stable conditions, the histogram should show an array of concentrated tall bars around its typical control position.

As long as this heat exchanging characteristic varies, the histogram should show a set of dispersed shorter bars.

Figures 4, 5 and 6 show the specific histograms.

Figure 4 - Histogram of positions for the internal bypass valve on period #1

Figure 5 - Histogram of positions for the internal bypass valve on period #2

Figure 6 - Histogram of positions for the internal bypass valve on period #3

3.1.3. The third step was to arrange the discrete positioning data by period in order to calculate the dot coordinates for the Weibull graphs.

A previous task was to state a criteria for identifying, on the whole data set, failures and suspensions. To do this, two aspects were taken into account:

The maximum and minimum values for the valve positioning to be considered.
The minimum amount of occurrences for each period to be considered, as a stated percentage of the maximum frequency at the histogram.

Every single histogram data satisfying these criteria was considered as failure, while every other as suspension.

From this, the whole data set for each period was arranged and the dot coordinates calculated using the Auth/Johnson adjusted rank¹ and the Benard's median rank² expressions.

Two different sets of failure/suspensions selection parameters were used to evaluate sensitivity on results.

3.1.3.1. In the first trial (named raw f/s filtering criteria), no filtering parameters were stated other than the histogram bars height greater than zero. Then, the following was obtained:

The critical correlation coefficient³ and the critical coefficient of determination r2⁴ were used to verify the goodness of fit for the calculated regression lines for each data set, estimated by means of the minimum squares method. Figures 7, 8 and 9 show the regression lines.

Now, the early conclusions regarding 2.1 objectives were:

Figure 7 - Regression line for data set #1 (period #1) - raw f/s filtering criteria

Figure 8 - Regression line for data set #2 (period #2) - raw f/s filtering criteria

Figure 9 - Regression line for data set #3 (period #3) - raw f/s filtering criteria

The three periods showed Beta values <1 (with very little differences), consistently implying run-in conditions (at the bathtub curve)5. These results are consistent with the real behavior, as degradation started each time from the very beginning after each restoration (see Figure 2).
The Eta values consistently increased at each period, representing a deterioration process deceleration. These results are also consistent with reality as the characteristic life increased after each restoration (see Figure 2).
The dot distributions at the regression line graphs (concavities at Figures 7, 8 and 9) may imply the presence of more than one failure mode and also increasing (in amount) at each new restoration.

3.1.3.2. In the second trial (named good fit f/s filtering criteria), the valve limits for each period were set according to each observed trend and the minimum amount of occurrences at 5% of the maximum. Then, the following was obtained:

Figures 10, 11 and 12 show the regression lines for each data set.

Figure 10 - Regression line for data set #1 (period #1) - good fit f/s filtering criteria

Figure 11 - Regression line for data set #2 (period #2) - good fit f/s filtering criteria

Figure 12 - Regression line for data set #3 (period #3) - good f/s filtering criteria

At these other conditions, the early conclusions regarding 2.1 objectives were:

The three periods showed Beta values >1, consistently implying wear out period (at the bathtub curve) and slightly increasing from period #1 to period #3, suggesting slight increasing deterioration in progress after each restoration.
The Eta values showed very little variation, but chaotic somehow (no interpretation).
The dot distributions at the regression line graphs (Figures 10, 11 and 12) may imply the presence of more than one failure mode.

3.1.3.3. Comparing the two previous items (3.1.3.1 and 3.1.3.2), it can be seen that even when the 3.1.3.1 data set filtering criteria do not consistently obtain good fits for the regression lines (while 3.1.3.2 do), it better represents the real behavior so, due to engineering judgment, the conclusion was to adopt this (3.1.3.1) data set filtering criteria that implies not to discard any sample, at least, for the histogram shapes showed at Figures 4, 5 and 6 (dispersed data). In these cases, the observed bad goodness of fit meaning could be that Weibull distribution was probably not the better one to represent the analyzed physical phenomena (may be log-normal adjusted better⁶).

3.2. Objective 2.2
3.2.1. As shown in Figure 2, the final condition for the internal bypass valve is at its limit, so it doesn't represent any more physical degradation process. In order to estimate the remaining life expectancy for this system, another border condition was chosen (as limit), which was its output temperature, now expected to slowly but consistently increase (as no final control element is available at the present condition for the heat exchanger).

Again, the same histogram based method was used for this variable, so the same three steps were followed. For the first one (real time data collection), an example is shown in Figures 13 and 14.

Figure 13 - Example of discrete values for the output temperature

Figure 14 - Graphical trend for the output temperature

3.2.2. For the second histogram, again the idea behind it was that each time the output temperature adopted a new value, it was due to some change at the heat exchanging condition, an issue associated to a failure (in absence of the final control element), according to a Weibull approach (no new value, good and stable condition, no failure).

The number of times the output temperature kept a single value (that means amount of hours) was considered as representing the survival period for this failure mode occurrence (failure mode sample).

Every other non-adopted values within the 0 to 450 °C range for the considered period were taken as suspensions. The specific histogram is shown in Figure 15.

Figure 15 - Histogram of output temperatures

3.2.3. For the third one (failures/suspensions), the filtering criteria were:

Maximum and minimum values for the temperature to be considered.
Minimum amount of occurrences to be considered in terms of a stated percentage of the maximum frequency at the histogram.

Three different sets of failure/suspensions selection parameters were used, although only the selected one according to engineering judgment is shown herein.

3.2.4. Setting the temperature limits for the period according to the observed trend (specifically >370 °C) and the minimum amount of occurrences at 10% of the maximum, the following parameters were obtained:

Figure 16 shows the regression line.

Figure 16 - Regression line for the data set

At this data filtering condition, the early conclusions regarding 2.2 objectives were:

The period showed a Beta value >1, implying a wear out period at the bathtub curve. This was found consistent with the real situation, as the temperature evolution is expected to be due to the final element unavailability, a condition reached after the already shown internal bypass valve full stroke (last period, Figure 2).
No comment for the calculated Eta value.
The dot distribution at the regression line graph (Figure 16), may imply the presence of more than one failure mode.
The goodness of fit for the regression line is quite good, so for this other histogram shape (concentrated data) and considering the previous comment on Beta value, the applied data set filtering criteria seems to be good (the other two were identical, but with the minimum amount of occurrences set at 1% and 25% of the maximum).
Now, regarding 2.2 objectives and setting a temperature limit of 448 °C and a target date to reach, the corresponding "R" (Reliability) value was calculated for the proposed scenario in Table 1.

Table 1 - Proposed scenario

As can be seen, it was estimated a 76.7% of success probability in a mission defined as reaching the future date June 15, 2012, with an output temperature no greater than 448 °C.

This estimation is reasonably consistent with a linear extrapolation forecast for this variable.

Additionally, the confidence bounds in the 5% and 95% ranks were also estimated (Beta binomial approach7) for each one of the 13 samples filtered as failures from the total of 450 at the histogram (see Figure 17). This estimation indicated an ETA uncertainty between 157.9 hours and 391.9 hours, so the corresponding one for "R" (76.7%) varied between 54.5% and 83.7% (this last calculation only for reference purposes).

Figure 17 - Beta binomial confidence bounds for the failures population

4. Conclusions

4.1. The exposed histogram-based method for historical process data analysis, combined with the Weibull distribution, seems primarily to be a possible approach for both characterizing failure modes from a black box perspective and predicting remaining life expectancy.

4.2. This approach, particularly regarding remaining life expectancy prediction, is quite different from a linear regression-based extrapolation. As could be seen, it considers not a tendency, but the stability characteristic of a deterioration process in progress, making this method an indirect determination approach. Additionally, every forecast made in terms of "R" calculation for many proposed future scenarios have demonstrated to be conservative when compared against linear extrapolations.

4.3. Particular attention needs to be paid, however, to the basic data set filtering criteria (failures/suspensions), according to the histogram pattern. As a primary view, the more dispersed bars, the more data to be considered as failure in order to obtain reasonably good representing parameters according to reality. The Weibull estimation parameters using the described methodology are pretty sensitive to the data set considered, particularly on extreme values. This also was found consistent with bibliography references,8 and particularly in regards to keeping the need for engineering judgment.

Additionally, a large practical experience was visualized as relevant for building the proper engineering criteria to deal with data beyond any strict statistical perspective.

References

Abernethy, Dr. Robert B. The New Weibull Handbook - Fourth Edition, Houston, TX: Dr Robert Abernethy, 2003

1. 2.9 Suspended Test Items - Pages 2-6.

2. 2.10 Benard's Approximation - Pages 2-7

3. Figure 3.4 - Critical Correlation Coefficient, r and Critical Coefficient of Determination, r² - Pages 3-4.

4. Figure 3.4 - Critical Correlation Coefficient, r and Critical Coefficient of Determination, r² - Pages 3-4

5. Figure 2.6 - The Bathtub Curve for a Good Component - Pages 2-8.6. 3.

6 - Curved Weibulls and The Log Normal Distribution - Pages 3-10.

7. 7.3.1 - Beta-Binomial Bounds - Pages 7-2.

8. 3.4 - Suspect Outliers - Pages 3-5.

Jorge Kalocai, Engineer in Electronics, CMRP, TÜV FS Exp, Certified RCM and PMO Facilitator and RCA Analyst, has eight years of applied experience in Asset Management, plus twenty-three years in Instrumentation, Automation and Control on industry. A Tutor for the Reliability Engineering Program since 2008 and Professor for the Risk Engineering Program since 2012, both at Austral University (Argentina), he is currently Reliability Supervisor at Profertil SA (Nitrogen Fertilizer Complex, Argentina).

From Your Site Articles

Reliabilityweb Weibull Analysis ›