Performing failure analysis on machinery can sometimes seem easy as the fault is staring at you in the frequency domain. But remember, this data has been massaged, averaged, windowed, and
A strong Failure Reporting, Analysis, and Corrective Action System (FRACAS) is the backbone of a good asset performance improvement effort. The FRACAS provides the business elements required to close the loop on Root Cause Failure Analysis (RCFA) and Reliability Centered Maintenance (RCM) efforts. The FRACAS changes RCFA from what are often one shot exercises to a managed program for systematically improving equipment and process performance. This chapter describes the basics of implementing the FRACAS and how to use it to insure implementation of RCFA recommendations.
It might seem trivial, but the best way to improve reliability is to choose equipment that doesn't breakdown! At the very least, choose designs that when they do fail they are easy, inexpensive and quick to fix. With the right choices in the beginning, maintenance departments can guarantee maintainability. The field of guaranteed maintainability was coined by Atlanta based consultant, Ed Feldman.
In an airline environment, there is only one acceptable standard - perfection.
Safety is everything, and since reliability is a large factor in safety, it gets a lot of attention.
An airline looks at reliability at every level of the operation, from performance of an aircraft to performance of the individual piece-parts of that aircraft. Airlines look at the impact of everything that touches an aircraft. Everything is calculated and recalculated to determine the impact to the operation. Small changes can impact the operation in a large way, and those impacts have to be predicted and dealt with.
A weekly collection of recommended articles and videos to boost your reliability journey. Right in your inbox
During my 27 years with DuPont, the safety culture was apparent. It was a part of everyone's job every day. As a result of a benchmarking study in the late 1980's and creation of a System Dynamics model to explain the benchmark results, it became clear that safety and reliability operate on the same principles. Both are significantly affected by defects and both require a commitment from everyone in the organization for improvements to be achieved.
Creating a structured reliability engineering department in a facility that has never had one is challenging enough. If you simultaneously implement a new computerized maintenance management software (CMMS) program, the hurdles get higher. The key to success is to have the right management support, good communication and a clear vision of what the future should be. This paper will discuss some of the triumphs and pitfalls that we have encountered on our unending journey through a complex culture change.
Risk management in military aviation has been a formal discipline in the field since the 1960's. The risk standards issued by the Department of Defense in 1969 was entitled "DoD Standard Practice for System Safety", MIL-STD-882. Air Force wide, the examples set forth in this standard have been used as though they were a required set of probabilities rather than examples. The semi-quantitative approach used today is further devalued by manager's arbitrary use of the Hazard Risk Matrix levels to mandate action. This paper examines the alternatives available today and recommends incorporation of a quantitative approach for more fidelity in risk management at all levels of management.
Editors Note: Although this paper is aimed at military aviation, the information on risk assessment and management is applicable to most industries as well.
Sterling Steel produces 450,000 tons of wire rod for its parent company, Leggett & Platt. The long products mini mill utilizes a 415 ton Electric Arc Furnace; two Ladle Metallurgy Facilities; an eight strand Billet Caster and a single strand Rod Mill to produce the wire rod for Leggett & Platt's Wire Mills.
Highly accelerated stress screen (HASS) uses the same stresses as HALT, but at a lower stress level. Compared to HALT testing, temperature and voltage extremes may be reduced by 10-15%, vibration levels reduced 50%, etc. depending upon the design although all the stresses may be above rated product specifications with the motivation to produce test results quickly for verifying product compliance.
A measure of use duration applicable to an item. For example, the life units may be starts-stops, run hours, hot-cold cycles, distances traveled, emergency starts or starts, shelf life, and other measurements which motivate failures.
A series of screens are conducted under environmental stresses to disclose weak parts and workmanship defects which require corrections and this requires and understanding of burn-in testing and ESS of which both techniques identify weak points and eliminate them by motivating early failures. Burn-in is usually a long process of operating under load(s) and at fixed temperature (in short, this is a special case of ESS) or it can be operated at varying loads and accelerated temperatures to achieve a shorter burin-in period, whereas ESS is a scientifically planned and conducted test which is usually conducted under accelerated loads to produce the same test/use results in a shorter period of time by increasing the stress on the components or assemblies. The objective of these screens is to produce a failure free product when released into operations. ESS is not intended as a test to validate compliance to a design, however it is intended to force latent defects into becoming defects before the end user finds them in day-to-day usage.
Data is the informational energy which runs the reliability improvement machine. Data is acquired at great cost. Data needs to be retained and used to prevent future failure events. Proper use of data provides an understanding of failure mechanisms and prevents reoccurrence of bad events which cause safety or high cost failures to occur. Reliability data requires definition of a failure. Failures can be catastrophic failures or slow degradation-you decide by defining the failures. The units of the measure for the data must be in units of the degradation-sometimes it is hours, some times it is miles, and so forth-in short, what ever motivates the failure. Reliability always ceases with a failure or a removal from service in some aged condition which then generates a category of data called a suspension or censored data. Data is information in the form of facts, figures, or engineering databases which is obtained from engineering tests, experiments, or actual operating conditions. Reliability data is often incomplete as the exact times to failure are rarely known or recorded with much precision so that only partial information is available for analysis. Reliability data comes in two forms: 1) age-to-failure data, and 2) censored/suspended data such as occurs when unfailed items are removed from service or when they fail due to a different failure mode than we are studying-this is useful information and part of the data set. Some data is better than no data for resolving reliability issues.
Failure mode and effect analysis (FMEA) is the study of potential failures that might occur in any part of a system to determine the probable effect of each failure on all other parts of the system and on probable operations success.
Quality Function Deployment or QFD is a bad translation of a good reliability technique for getting the voice of the customer into the design process so the product delivered is the product the customer desires.
Total productive maintenance (TPM) is a corporate-wide effort involving all employees to fully use equipment to the maximum limit employing an equipment-oriented management concept to reduce failures and increase utilization of equipment and processes in a productive manner. TPM programs are teamwork programs and require a corporate culture of teamwork devoid of us vs. them issues. All employees are expected to accept ownership of the equipment and processes to do many small things all the time to insure high levels of availability by eliminating failures in the early stages with low cost actions. The employees approach the process equipment as owners rather than renters.
Poisson distributions are discrete distributions and the simplest statistic process where Poisson events are random in time which describes a stable average rate of occurrence of counted events.
Failure reporting and corrective action systems (FRACAS) is an organized database for aiding in solving reliability problems using a common sense approach by systematically and permanently removing failure mechanism.