Since the discovery of modern asset reliability principles, first detailed by F. Stanley Nowlan and Howard F. Heap in the mid-1960s, up until the latest evolution in the 1990s by John Moubray, some 30 odd years have passed, but with little rigorous adoption of these principles into the asset management strategies of North American industry. This article is intended to help explain why the adoption of these truths has been so hard to come by over these past 30 years and what it will take for the adoption of these reliability principles to occur.
In North America, organizations have continually passed through the endless cycle of reliability despair due to a lack of understanding of the fundamental principles of strategic reliability that Nowlan & Heap and John Moubray have outlined. Organizations frequently pass from good reliability to poor performance and back again, never understanding why this cycle occurs or how to break it. The most important question for today’s reliability leaders is: Why have we failed to sustain reliability efforts of the past in the presence of clear and tangible results of our past efforts? Until we can answer this question, we cannot break the cycle of despair.
The answer must be a lack of knowledge and understanding of these reliability principles by the entire organization attempting to establish reliable asset performance. What else can rationally explain why someone would ignore something to be true unless that person did not really know the truth in the first place? Basically it boils down to this: Most leaders within North American industry today do not understand (or even know about) these principles of reliability, how the principles were derived and how to implement them into a strategic plan. Most of North America’s industrial leadership is ignorant of what is required to achieve meaningful reliable asset performance. This ignorance is not necessarily of their making. Asset reliability is not taught in business academia and only occasionally taught in colleges of engineering. Often, today’s organizational leaders have never been exposed to these reliability principles in a direct and fundamental way. Consequently, today’s organizational leaders have learned false principles that they believe to be true and giving up those false ideas is difficult to overcome.
So what are these principles of reliability? Basically, the principles set up the cultural aspects of how an organization should view reliability. Culture is a shared system of beliefs that an organization uses to solve its problems because they have worked well enough in the past to be taught to others as the correct way to act. An example of this is when you hear someone say, “That’s not how we do it here.” The individual making the statement is reinforcing the company’s system of beliefs on how to act. The person is reinforcing the culture. The principles of reliability set out a new paradigm of reliability, a new way to think and act. Adoption of these principles is not easy, but the steps are basic. The first step is learning these principles.
By my count, there are NINE PRINCIPLES OF RELIABILITY that must be understood in order to achieve a full understanding of reliable asset performance. They are as follows:
Eighty percent (±) of all equipment failures occur randomly with respect to time. The age of an asset does not increase the conditional probability of failure.
Indication of pending functional failure (the failed states) follow a predictable degradation curve known as the P-F interval. As an asset moves towards functional failure, the asset will give off detectable signals at definable time intervals, thus giving an organization time to react proactively.
The human senses are capable of detecting 80 percent (±) of failed states. A dependence or focus on technological devices is unwarranted. Most failed states can easily be detected using sight, hearing, touch, smell and taste.
Those working closest to a problem are the best equipped to solve the problem. They are the subject matter experts (SMEs).
There is no need for the collection of data first in order to achieve asset reliability. The data we think we need to collect will be prevented from collection if we have a strong reliability program in place. The data we think we need already exists; it is in the minds of the SMEs.
There must be an understanding of the meaning of failure consequences (safety, environmental, production and nonoperational). Further, there must be an understanding that 30 percent of the failure consequences occur hidden under normal operating conditions.
Risk is inherent in everything we do. It is not possible to eliminate risk, only to identify what level of risk is acceptable (tolerable). Not to define a tolerable level of risk leads the organization to a default position of ignoring risk altogether. We must be brave enough to talk about risk in terms of acceptable injuries per year and how that leads us to an acceptable risk calculation for every task one may be asked to perform. For example: If an organization sets as its goal not to have injured more than five workers over a one-year period and there are 1,000 employees, then the acceptable injury rate is one injury per 200 years worked (5:1,000 or 1:200 employeeyears). In a calendar year, 1,000 employees will work 1,000 years. Once the acceptable injury rate is known (1:200 employee-years), the next step is to determine the number of tasks/events within an organization that could injure someone. A detailed discussion of this type of determination is not practical for this article, however, it might be as simple as taking an organization log of safety incidents (e.g., near misses, need for first aid, OSHA recordable) and determining the average number per year. This would give an organization a good idea of what its number of tasks/events that could lead to an injury would look like. To illustrate:
Let’s assume a 1:200 employee-year injury rate is tolerable.
Let’s assume that from our safety logs we have determined that we have 240 tasks/events per year that could lead to an injury.
We know we have 1,000 employees.
1: (200 employee-years/1,000 employees) = one injury for every 0.2 calendar years. Or one injury about every 2-1/2 months.
1: (0.2 X 240 tasks or events) yields 1:4 years.
Multiplying the 1:4 years by a safety factor (SF) accounts for errors in determining the number of events per year (200).
Let’s assume a SF of two.
1: (4X2) = 1:8 years.
This means that for any of the tasks/events that could result in an injury, the probability of occurrence must be below 1:8 years.
No doubt this is a tough conversation to have, but to ignore this discussion is to leave risk at an undefined level, which cannot be tolerated.
Assets can only perform as well as they are designed, installed, operated and maintained. We must understand what our assets can do versus what we want them to do.
Failure mode identification must be categorized into three categories:
a. Suddenly b. Over a period of time c. Hidden
Until all nine principles of reliability are understood by an organization, it cannot move on to the second step.
The next or second step is for an organization to become open to these principles. Often, this is driven by a need for change, either from external pressures of competition or internal pressures to improve performance, like safety, environmental, quality, or production. But this is a conundrum – what comes first, the understanding of the principles or the openness to learn the principles? The answer is both and is dependent on the individual and/or organization. In today’s organizations, as stated previously, the majority of leadership is ignorant of these principles, so the openness to new principles is limited. Change for these leaders is only driven by pressure, external or internal. However, people attending school, whether professional or trade, are open to new ideas and will accept these principles without pressure to change because for them, there is no change. The key to the second step is understanding who needs to change. If it is leadership, then they will first need a compelling reason. If it is employees new to an organization or still attending school, they will accept the principles without the need for a compelling reason.
The third step is a commitment by the organization that the principles of reliability should be used to govern the way an organization makes its decision about asset management. When questions arise, organizations should look to these principles to solve their problems and improve asset performance. Far too often, organizations fall into the trap of tactile application of reliability elements rather than strategic application of the principles of reliability. Organization are tempted to focus on an element of reliability, such as planning and scheduling, equipment history, predictive tools and many others, rather than focusing keen attention to the principles of reliability. It is the tactical application of these elements without a strategic understanding of the principles that has led us to the endless reliability cycle of despair. It is more important that the right job be done than doing the job right. Organizations are often overly focused on performing work correctly, rather than understanding if the work needs to be done in the first place. Without first mastering the principles, one cannot move on to the application and if questions or challenges arise, one must be ready to fall back on the principles for guidance. The principles of reliability are the foundation to all elements of a reliability asset management strategy.
The last step puts the focus on today’s reliability leaders. Our task is to educate ourselves on what the principles of reliability are; then we must educate everyone else. The work begun by Nowlan & Heap and carried forward by John Moubray is now in our hands. It is the duty of today’s reliability leaders to move the work through to its next phase – the education of business and engineering professionals, as well as the trade/craftspeople, on these principles of reliability. Not until everyone from the shop floor to the boardroom understands these principles of reliability can the work of the next phase begin – the work of establishing proactive asset management plans with lasting effects.
Jay Shellogg is a Civil Engineer with 4 years consulting engineering experience and 14 years of experience at a large pulp & paper mill. His work for the first 5 years in the pulp & paper industry was as a Sr. Environmental Engineer and the last 9 years spent in maintenance as a Sr. Maintenance Engineer, then as the Reliability Maintenance Superintendent, and today as a Maintenance Superintendent over general services. In late 2005, Jay was tasked with the project lead for budgeting and implementation of a reliability solution at his mill.