The RELIABILITY Conference: 2 Days of Learning, Networking and Reliability Excellence

International Maintenance Conference 2025: The Speed of Reliability

Sign Up

Please use your business email address if applicable

reliability engineering

The Reliability Engineering Toolbox

Weibull Database

The smartest way to maintain a reliability database is in Weibull format and Weibull databases are available.

Lognormal

Lognormal distributions are continuous life functions that have long tails to the right (display positive skewness) in time or usage. A lognormal distribution plotted on semi-log papers would appear as a normal curve.

The Reliability Engineering Toolbox

Overall equipment effectiveness (OEE)

Overall equipment effectiveness (OEE) is a manufacturing index to reduce complexity of discrete systems for problem solving and benchmarking.

Reliability-Centered Maintenance

Reliability-Centered maintenance (RCM) is a systematic planning process used to determine the maintenance requirements for a system. RCM expects the system has an inherent reliability and maintenance requirements are imposed upon the baseline of inherent safety and inherent reliability which can be no better than the worst than designed into the system.

banner
A weekly collection of recommended articles and videos to boost your reliability journey. Right in your inbox
DOWNLOAD NOW
The Reliability Engineering Toolbox

Reliability

Reliability is the probability that a device, system, or process will perform its prescribed duty without failure for a given time when operated correctly in a specified environment.

Cost Of Unreliability

The cost of unreliability is a big picture view of system failure costs, described in annual terms, for a manufacturing plant as if the key elements were reduced to a series block diagram for simplicity. It looks at the production system and reduces the complexity to a simple series system where failure of a single item/equipment/system/processing-complex causes the loss of productive output along with the total cost incurred for the failure. If the system IS sold out, then the cost of unreliability must include all appropriate business costs such as lost gross margin plus repair costs, scrap incurred, etc. If the system is NOT sold out, and make-up time is available in the financial year, then lost gross margin for the failure cannot be counted. The cost of unreliability is a management concern connected to management's two favorite metrics: time and money.

Dependability

The International Electrical Congress (IEC) defines dependability as "Dependability describes the availability performance and its influencing factors: reliability performance, maintainability performance and maintenance support performance." MIL-HDBK-338 defines dependability differently as a measure of the degree to which an item is operable and capable of performing its required function at any (random) time during a specified mission profile, given that the item is available at mission start. (Item state during a mission includes the combined effects of the mission-related system R&M parameters but excludes non-mission time; see availability.) Dependability is related to reliability with the intention that dependability would be a more general concept than the measurable issues of reliability, maintainability, and maintenance.

The Reliability Engineering Toolbox

Failure Forecast

Failure forecasting is a projection of failures into the future based on assumed or documented failure details

The Reliability Engineering Toolbox: Failure Rates

Failure Rates

Failure rates, in the simplest form, are S(time in use)/S(number of failures) or the reciprocal of mean times to/between failure.

Life Units

A measure of use duration applicable to an item. For example, the life units may be starts-stops, run hours, hot-cold cycles, distances traveled, emergency starts or starts, shelf life, and other measurements which motivate failures.

Environmental Stress Screening (ESS)

A series of screens are conducted under environmental stresses to disclose weak parts and workmanship defects which require corrections and this requires and understanding of burn-in testing and ESS of which both techniques identify weak points and eliminate them by motivating early failures. Burn-in is usually a long process of operating under load(s) and at fixed temperature (in short, this is a special case of ESS) or it can be operated at varying loads and accelerated temperatures to achieve a shorter burin-in period, whereas ESS is a scientifically planned and conducted test which is usually conducted under accelerated loads to produce the same test/use results in a shorter period of time by increasing the stress on the components or assemblies. The objective of these screens is to produce a failure free product when released into operations. ESS is not intended as a test to validate compliance to a design, however it is intended to force latent defects into becoming defects before the end user finds them in day-to-day usage.

The Reliability Engineering Toolbox

Pareto Distribution

Vilfredo Pareto, and Italian economist in the late 1800s, who described the unequal distribution of wealth in the world.

The Reliability Engineering Toolbox

Failure

Failure is the loss of function when you needed the function to occur.

Data

Data is the informational energy which runs the reliability improvement machine. Data is acquired at great cost. Data needs to be retained and used to prevent future failure events. Proper use of data provides an understanding of failure mechanisms and prevents reoccurrence of bad events which cause safety or high cost failures to occur. Reliability data requires definition of a failure. Failures can be catastrophic failures or slow degradation-you decide by defining the failures. The units of the measure for the data must be in units of the degradation-sometimes it is hours, some times it is miles, and so forth-in short, what ever motivates the failure. Reliability always ceases with a failure or a removal from service in some aged condition which then generates a category of data called a suspension or censored data. Data is information in the form of facts, figures, or engineering databases which is obtained from engineering tests, experiments, or actual operating conditions. Reliability data is often incomplete as the exact times to failure are rarely known or recorded with much precision so that only partial information is available for analysis. Reliability data comes in two forms: 1) age-to-failure data, and 2) censored/suspended data such as occurs when unfailed items are removed from service or when they fail due to a different failure mode than we are studying-this is useful information and part of the data set. Some data is better than no data for resolving reliability issues.

FMEA is part of the Reliability Strategy Development toolbox

Failure Mode and Effect Analysis - FMEA

Failure mode and effect analysis (FMEA) is the study of potential failures that might occur in any part of a system to determine the probable effect of each failure on all other parts of the system and on probable operations success.

The Reliability Engineering Toolbox

Quality Function Deployment

Quality Function Deployment or QFD is a bad translation of a good reliability technique for getting the voice of the customer into the design process so the product delivered is the product the customer desires.

Total Productive Maintenance

Total productive maintenance (TPM) is a corporate-wide effort involving all employees to fully use equipment to the maximum limit employing an equipment-oriented management concept to reduce failures and increase utilization of equipment and processes in a productive manner. TPM programs are teamwork programs and require a corporate culture of teamwork devoid of us vs. them issues. All employees are expected to accept ownership of the equipment and processes to do many small things all the time to insure high levels of availability by eliminating failures in the early stages with low cost actions. The employees approach the process equipment as owners rather than renters.

The Reliability Engineering Toolbox: Poisson Distribution

Poisson Distribution

Poisson distributions are discrete distributions and the simplest statistic process where Poisson events are random in time which describes a stable average rate of occurrence of counted events.

The Reliability Engineering Toolbox

FRACAS

Failure reporting and corrective action systems (FRACAS) is an organized database for aiding in solving reliability problems using a common sense approach by systematically and permanently removing failure mechanism.

Mean Time

A density figure-of-merit metric often referred to as the average or expected value. In the simplest form it appears as arithmetic S(time)/S(events) or in complicated situations as a statistic metric. It applies to mean life (ML), mean down time (MDT), mean maintenance time (MMT), mean time between failures (MTBF for repairable items), mean time to failures (MTTF for replacement items), mean time between maintenance (MTBM), mean time between maintenance scheduled (MTBMs), mean maintenance time unscheduled (MMTu), mean maintenance time scheduled (MMTs), mean time between overhauls (MTBO), mean time between unscheduled removals(MTBRu), mean time to restore (MTR), mean time between downing events (MTBDE), and so forth. The units will be time/metric, e.g., hours/failure. The reciprocal of the metric provides an incident rate, e.g., failures/hour.