Level Up Your Reliability Skills: Get Certified! Boost your career now!

Elevate your industry profile at The RELIABILITY Conference.

Sign Up

Please use your business email address if applicable

When failures occur, it is usually first recognized at the equipment level by operating or production units (should personnel be mentioned instead of units?). When a malfunction takes place, if the work request is entered promptly into the system, then accurate time periods are automatically recorded. The same is true of the time that it takes to restore the equipment from the malfunction. These time periods are needed later on to determine the MTBF (mean time between failure) and MTTR (mean time to repair).

When did the failure occur?

In SAP, the system proposes the malfunction start and end dates / times of the notification (work request) based on the creation and completion dates / times. These proposed values can be updated by the user as needed.

Where did the failure occur?

Documenting where a failure occurred is done by entering the work request in the system using the correct technical object (location or equipment record). Often system users find it easier to simply write the work request to some location that is easy to remember, rather than searching for the correct place in the system. It is better to use the lowest appropriate level in the hierarchy such as the record which includes the actual maintainable item rather than the higher level location or subsystem. Also, operators must be sure to indicate that a failure exists in some fashion, by using either a specific work request / order type, setting an indicator on the work request /order or entering additional failure information on the work request / order. Having this information in the system makes it possible to distinguish component replacements which were due to failure related events from those which were non-failure replacements. Reliability analysis can be adversely affected by treating a data point as a failure event when it should have been treated as something else, such as a component replacement without failure (for other reasons).

It may be best to use a specific work request type or order type to report equipment failures or malfunctions. This will facilitate future analysis of the work history data; otherwise, it will be more difficult to understand the costs and labor needed to make repairs. These work requests / orders can be segregated from other activities such as preventive maintenance (PM), predictive maintenance (PdM), routine maintenance, improvements, etc. In some systems, there is a unique indicator to show that a failure exists. For example, in SAP, the work request (notification) has a "breakdown indicator." In fact, the notification in SAP is used to store all the technical history for the repair, while the maintenance work order is the controlling document used to plan, estimate and execute the work.

How was the failure discovered?

A failure event is discovered in some fashion, and this is the "method of detection." The failure may have occurred because something affected the production of the manufactured product, was observed during normal rounds, during routine tests, or discovered in some other way, such as a chance observation. This information is crucial to determine if the existing strategies are effective or if new strategies may be needed.

What is the symptom?

This first level of failure is called the "failure mode" in ISO 14224. It is the visible symptom at the equipment asset level. When this observation takes place, a work request or work order is usually entered in the EAM / CMMS. Similar to a parent taking their child to the clinic, all that is known at this time is the symptom. No detailed reporting or analysis occurs at this point. For example, an operator may report that the pump failed to start. This is done by describing the problem and selecting the correct code for the failure mode in the system. Since the symptoms are somewhat generic, the same list of codes may typically be used for all equipment types.

What was the effect of the failure?

Usually, the person reporting the failure has some idea of the effect of the failure on the organization. There may have been no effect, or the failure may have affected the environment, safety or production. This information is not only useful in prioritizing any subsequent mitigating actions but is also an important aid in compliance reporting to third parties like environmental or safety regulatory agencies. If possible, indicate the degree of failure or functional loss. The equipment experienced a malfunction, but to what extent did the equipment malfunction? Was it a "complete" failure, such as when the pump fails to start? Was it a "partial" failure, such as when the pump cannot maintain the desired flow rate? Was it a "potential" or "latent" failure? While these may not seem like actual failures because they result from event conditions which do not trigger an active fault , it is likely that they will do so at a future point. Categorizing these events correctly supports developing better mitigation strategies moving forward.

Keep the guidelines simple, yet effective. When something malfunctions:

• Write the work request / order to the equipment asset
• Indicate that a failure exists
• Select the correct code for the failure mode
• Note who found the failure and how the failure was discovered
• Describe the problem
• Optionally, indicate the operational effect and the degree-of-failure

Do not require the system users to enter additional failure information, like maintainable item or failure mechanism, when creating the work request / order. At this point, they only know the observable symptom; it frustrates them and results in bogus entries into the system just to get past the required field entry.
This additional information should be entered later by those making the repairs. It will be more accurate and meaningful during later analyses.

*This article is an excerpt from the whitepaper "Understanding the Basics of Failure and Event Coding for EAM and CMMS" published in the March 2009 issue of APM Advisor (www.apmadvisor.com)

Tip provided by Ralph Hanneman, CMRP, Senior Consultant, Meridium, Inc. Ralph is a Senior Consultant for Meridium, Inc. who began his career in the Navy where he obtained considerable experience in the maintenance and operation of all facets of shipboard electrical systems, as well as the Navy's preventive maintenance methodology. He has nearly 20 years maintenance management and project engineering experience in pulp & paper, automotive parts manufacturing and heavy construction (tunnel boring). Ralph has substantial experience in many facets of SAP gained with five years of ERP implementation projects in Plant Maintenance roles. He is experienced in SAP's BI solution, in particular Plant Maintenance reporting. Since joining Meridium in early 2007, Ralph has been involved in pilots and enterprise implementations of Meridium with PEMEX, BMA Coal, Hydro One, Hess, Rio Tinto, Bruce Power, Samarco Mineração and Flint Hills Resources in consulting and project management roles. He has conducted failure code development workshops for clients using SAP and Oracle eAM.