The RELIABILITY Conference: 2 Days of Learning, Networking and Reliability Excellence

The RELIABILITY Conference® : TRAIN & TRANSFORM

Sign Up

Please use your business email address if applicable

reliability engineering

Failure Analysis Fundamentals

Performing failure analysis on machinery can sometimes seem easy as the fault is staring at you in the frequency domain. But remember, this data has been massaged, averaged, windowed, and

...

FRACAS – Unleashing the Power of the EAM as a Reliability Improvement Tool

Note: Originally presented by Bill Keeter, Allied Reliability, at EAM-2006 The Enterprise Asset Management Summit in Las Vegas

A strong Failure Reporting, Analysis, and Corrective Action System (FRACAS) is the backbone of a good asset performance improvement effort. The FRACAS provides the business elements required to close the loop on Root Cause Failure Analysis (RCFA) and Reliability Centered Maintenance (RCM) efforts. The FRACAS changes RCFA from what are often one shot exercises to a managed program for systematically improving equipment and process performance. This chapter describes the basics of implementing the FRACAS and how to use it to insure implementation of RCFA recommendations.

What is guaranteed Maintainability?

It might seem trivial, but the best way to improve reliability is to choose equipment that doesn't breakdown! At the very least, choose designs that when they do fail they are easy, inexpensive and quick to fix. With the right choices in the beginning, maintenance departments can guarantee maintainability. The field of guaranteed maintainability was coined by Atlanta based consultant, Ed Feldman.

Reliability in a Safety Sensitive Environment

Originally presented at IMC-2006 - The 21st International Maintenance Conference

In an airline environment, there is only one acceptable standard - perfection.

Safety is everything, and since reliability is a large factor in safety, it gets a lot of attention.

An airline looks at reliability at every level of the operation, from performance of an aircraft to performance of the individual piece-parts of that aircraft. Airlines look at the impact of everything that touches an aircraft. Everything is calculated and recalculated to determine the impact to the operation. Small changes can impact the operation in a large way, and those impacts have to be predicted and dealt with.

banner
A weekly collection of recommended articles and videos to boost your reliability journey. Right in your inbox
DOWNLOAD NOW

The Principles Driving Safety & Reliability: A Look at the History of DuPont

During my 27 years with DuPont, the safety culture was apparent. It was a part of everyone's job every day. As a result of a benchmarking study in the late 1980's and creation of a System Dynamics model to explain the benchmark results, it became clear that safety and reliability operate on the same principles. Both are significantly affected by defects and both require a commitment from everyone in the organization for improvements to be achieved.

The Ups and Downs of Reliability Engineering and CMMS Implementation at Lone Star Steel

Creating a structured reliability engineering department in a facility that has never had one is challenging enough. If you simultaneously implement a new computerized maintenance management software (CMMS) program, the hurdles get higher. The key to success is to have the right management support, good communication and a clear vision of what the future should be. This paper will discuss some of the triumphs and pitfalls that we have encountered on our unending journey through a complex culture change.

Risk Management in Military Aviation

Risk management in military aviation has been a formal discipline in the field since the 1960's. The risk standards issued by the Department of Defense in 1969 was entitled "DoD Standard Practice for System Safety", MIL-STD-882. Air Force wide, the examples set forth in this standard have been used as though they were a required set of probabilities rather than examples. The semi-quantitative approach used today is further devalued by manager's arbitrary use of the Hazard Risk Matrix levels to mandate action. This paper examines the alternatives available today and recommends incorporation of a quantitative approach for more fidelity in risk management at all levels of management.

Editors Note: Although this paper is aimed at military aviation, the information on risk assessment and management is applicable to most industries as well.

How Sterling Steel Increases Equipment Availability and Throughput

Sterling Steel produces 450,000 tons of wire rod for its parent company, Leggett & Platt. The long products mini mill utilizes a 415 ton Electric Arc Furnace; two Ladle Metallurgy Facilities; an eight strand Billet Caster and a single strand Rod Mill to produce the wire rod for Leggett & Platt's Wire Mills.

Highly Accelerated Stress Screen - HASS

Highly accelerated stress screen (HASS) uses the same stresses as HALT, but at a lower stress level. Compared to HALT testing, temperature and voltage extremes may be reduced by 10-15%, vibration levels reduced 50%, etc. depending upon the design although all the stresses may be above rated product specifications with the motivation to produce test results quickly for verifying product compliance.

The Reliability Engineering Toolbox

Failure Forecast

Failure forecasting is a projection of failures into the future based on assumed or documented failure details

Life Units

A measure of use duration applicable to an item. For example, the life units may be starts-stops, run hours, hot-cold cycles, distances traveled, emergency starts or starts, shelf life, and other measurements which motivate failures.

Environmental Stress Screening (ESS)

A series of screens are conducted under environmental stresses to disclose weak parts and workmanship defects which require corrections and this requires and understanding of burn-in testing and ESS of which both techniques identify weak points and eliminate them by motivating early failures. Burn-in is usually a long process of operating under load(s) and at fixed temperature (in short, this is a special case of ESS) or it can be operated at varying loads and accelerated temperatures to achieve a shorter burin-in period, whereas ESS is a scientifically planned and conducted test which is usually conducted under accelerated loads to produce the same test/use results in a shorter period of time by increasing the stress on the components or assemblies. The objective of these screens is to produce a failure free product when released into operations. ESS is not intended as a test to validate compliance to a design, however it is intended to force latent defects into becoming defects before the end user finds them in day-to-day usage.

The Reliability Engineering Toolbox

Pareto Distribution

Vilfredo Pareto, and Italian economist in the late 1800s, who described the unequal distribution of wealth in the world.

The Reliability Engineering Toolbox

Failure

Failure is the loss of function when you needed the function to occur.

Data

Data is the informational energy which runs the reliability improvement machine. Data is acquired at great cost. Data needs to be retained and used to prevent future failure events. Proper use of data provides an understanding of failure mechanisms and prevents reoccurrence of bad events which cause safety or high cost failures to occur. Reliability data requires definition of a failure. Failures can be catastrophic failures or slow degradation-you decide by defining the failures. The units of the measure for the data must be in units of the degradation-sometimes it is hours, some times it is miles, and so forth-in short, what ever motivates the failure. Reliability always ceases with a failure or a removal from service in some aged condition which then generates a category of data called a suspension or censored data. Data is information in the form of facts, figures, or engineering databases which is obtained from engineering tests, experiments, or actual operating conditions. Reliability data is often incomplete as the exact times to failure are rarely known or recorded with much precision so that only partial information is available for analysis. Reliability data comes in two forms: 1) age-to-failure data, and 2) censored/suspended data such as occurs when unfailed items are removed from service or when they fail due to a different failure mode than we are studying-this is useful information and part of the data set. Some data is better than no data for resolving reliability issues.

FMEA is part of the Reliability Strategy Development toolbox

Failure Mode and Effect Analysis - FMEA

Failure mode and effect analysis (FMEA) is the study of potential failures that might occur in any part of a system to determine the probable effect of each failure on all other parts of the system and on probable operations success.

The Reliability Engineering Toolbox

Quality Function Deployment

Quality Function Deployment or QFD is a bad translation of a good reliability technique for getting the voice of the customer into the design process so the product delivered is the product the customer desires.

Total Productive Maintenance

Total productive maintenance (TPM) is a corporate-wide effort involving all employees to fully use equipment to the maximum limit employing an equipment-oriented management concept to reduce failures and increase utilization of equipment and processes in a productive manner. TPM programs are teamwork programs and require a corporate culture of teamwork devoid of us vs. them issues. All employees are expected to accept ownership of the equipment and processes to do many small things all the time to insure high levels of availability by eliminating failures in the early stages with low cost actions. The employees approach the process equipment as owners rather than renters.

The Reliability Engineering Toolbox: Poisson Distribution

Poisson Distribution

Poisson distributions are discrete distributions and the simplest statistic process where Poisson events are random in time which describes a stable average rate of occurrence of counted events.

The Reliability Engineering Toolbox

FRACAS

Failure reporting and corrective action systems (FRACAS) is an organized database for aiding in solving reliability problems using a common sense approach by systematically and permanently removing failure mechanism.