Reliabilityweb Risky business: Problem areas uncovered by the transition to risk based asset management

Risky business: Problem areas uncovered by the transition to risk based asset management

On the 14th of July 2005 Ofgem, the regulator of the Energy utilities released a letter titled "Refocusing Ofgem's Asset Risk Management (ARM) Activity" which referred to a voluntary comparison process against the principles contained within PAS 55 as a tool that "promotes requirements, which allows operators to demonstrate effective asset management".

Recent history has also included a report commissioned by the Office of the PPP Arbiter (OPPPA) to review good practice in Asset Management evaluation and to draw on this to develop an Asset Management Evaluation Framework, using PAS 55 as one of the key evaluating tools. PAS 55 has also been used in recent signalling management evaluations prepared for the Office of the Rail Regulator. PAS55 has also made its USA debut and is being implemented into a major electrical utility within that nation.

Regardless of the nuances between the various benchmark tools, and the differing approaches in each of them, it is now obvious that there has been a fundamental shift within asset-intensive industry towards a risk based approach to managing physical assets. There are several key differences between today's modern risk management approach, and previous approaches. For example, today consequence is as big an element of asset-condition management as probability is, an approach focused on risk rather than likelihood of failure only.

This is obviously a welcome transition for those of us working in the field of modern asset management, and provides the managerial discipline with a strong basis for moving towards even greater economic and risk management efficiencies. However, it also presents us with some unique problems.

Modelling future asset performance requires us to have a good grasp of the two fundamental elements of risk, those of the consequence and probability of failure. Regardless of the method used, consequence can be determined relatively straight forward. There are ongoing debates regarding how this is done, and how to make it relative, but these are details only. The underlying concept is widely understood and able to be applied, albeit with some pain along the way.

Where things become significantly more difficult is in the drive to model the probability of failure. The underlying theories of maintenance and that of reliability are based on the theory of probability and on the properties of distribution functions that have been found to occur frequently, and to play a role in the prediction of survival characteristics. This requires the input of a range of variables including condition, usage and most importantly, failure data itself.

Resnikov, in his early work in the field of reliability, made the statement that historical analyses of data are rarely successful. While this has changed a little since this statement was first made, it still captures the challenge of modern asset management.

Defining critical is often contentious so for the sake of this paper critical failures will be those that cause the asset to perform at less than acceptable levels of performance.

Acceptable and unacceptable levels of performance

Non-critical failures are those of low or negligible cost consequences only. These are acceptable and can be allowed to occur. Therefore a policy that focuses on data capture and later analysis as its base can be used effectively. Over time the level of information will accumulate to allow asset owners, and policy designers, to determine the correct maintenance policy with a high degree of confidence.

Critical failures are, by their very nature, serious. When they occur they are often designed out, a replacement asset is installed, or some other initiative is put in place to ensure that they don't recur. As a result the volume of data available for analysis is often small; therefore the ability of statistical analysis to deliver results within a high level of confidence is questionable at best.

Information vs. Data pie chart

It has been the experience of the author that on commencing reliability initiatives most companies do so with a conservative estimate of 30% empirical data and 70% end-user knowledge. While this still leads to improvement, it is far from the high confidence risk based decisions required in today's asset management environment. Particularly with the scale of economic impact of getting it wrong, or where getting it wrong could significantly impact upon safety.

This is the central polemic issue relating to risk modelling. Companies, by themselves, rarely have the level of failure data required to perform accurate probabilistic analyses. Even if their failure capture technologies and processes were able to deliver the right quality of failure data, (and many organisations have overcome this hurdle) they need to have a large number of asset failures before they can produce high confidence probability models.

It can be said that one of our goals as asset managers, either through operational or capital asset maintenance, is to reduce the number of critical failures. Therefore part of our goal is to reduce the level of failure information that is available for analysis, not increase it!

For simple assets where there is a dominant cause of failure such as erosion, corrosion, evaporation or oxidization, techniques such as age exploration, inspection and usage monitoring techniques can be put into practice. Modern technology has made this relatively applicable and economical. However, where assets are affected by random failures, subject to human error or unable to be gauged through standard asset monitoring techniques, then asset failure data is a critical element of high confidence decision making.

It is one thing to predict the failure of, say, a transformer based upon measurable indicators of the onset of failure. It is another thing entirely to be able to accurately forecast the most likely failure rates of a failure mode known to be random.

This is slightly alarmist. There are modern methods of taking decisions with small samples of dubious quality, as opposed to "crashing a few more assets". Random number generation methods, sampling, and other mathematical procedures go some of the way to bridging the gap between what we have and what we need. Human error forecasting methods such as HEART and THERP also contribute to a more accurate model. Yet, truly high confidence decisions require us to base our judgment on real historical data.

So the scope and size of the challenge before us is clear; one company alone generally will not have the quantity of failure data required to be able to take high-confidence decisions regarding asset management without having experienced significant unacceptable events. The future in risk based asset management in the medium term will focus on the hunt for relevant quality data, produced by assets operating in similar conditions and of comparable designs.

Collaborative efforts to do this are just beginning in some industries, mature in others, and not even contemplated in yet other industries. If the companies want to quicken the journey to competitive advantage, then finding a way to capture, mine, and apply failure and performance data from as yet unexploited collaborative data bases will need to be a key strategy in their drive towards high confidence risk based decisions.

Good luck!

Bibliography

Mathematical Aspects of Reliability-centered Maintenance, H. L. Resnikov, National Technical Information Service, US Department of commerce, Springfield

Captured by Data, Daryl Mather,

MD212, 2nd of February 2006, Ofwat, www.ofwat.gov.uk

Refocusing Ofgem's Asset Risk Management (ARM) Activity,14th of July 2005, Ofgem, www.ofgem.gov.uk

Independent Assessment of SICA using PAS 55 as a guide, © Lloyd's Register Rail, prepared for the Office of Rail Regulation, July 2005