Most business decision have considerable uncertainty which implies at least two outcomes if you choose a course of action. Making decisions in the face of uncertainty requires the costs for taking action and the probability along with the cost for not taking action and the probability of the occurrence. In most cases the probabilities are not well known (maybe to one significant digit) and the costs are not well know (maybe to $10000). The quantitative assessment is called risk assessment. The issue is to take these not well identified issues and devise a strategy which can minimize exposure to risk for the business. The graphical representation of the methodology is called decision trees to reach the expected values for decision to take/not-take action.
The measure of the ability of an item to be retained in or restored to specified condition when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resources.
Reliability audits verify your reliability program is effective and find areas of weakness for corrective action. They are inquiries by factual examination of elements of the system with a written an objective criteria for performance beginning with an assessment of how management is involved and are they effective in building an productive reliability program.
A weekly collection of recommended articles and videos to boost your reliability journey. Right in your inbox
The cost of unreliability is a big picture view of system failure costs, described in annual terms, for a manufacturing plant as if the key elements were reduced to a series block diagram for simplicity. It looks at the production system and reduces the complexity to a simple series system where failure of a single item/equipment/system/processing-complex causes the loss of productive output along with the total cost incurred for the failure. If the system IS sold out, then the cost of unreliability must include all appropriate business costs such as lost gross margin plus repair costs, scrap incurred, etc. If the system is NOT sold out, and make-up time is available in the financial year, then lost gross margin for the failure cannot be counted. The cost of unreliability is a management concern connected to management's two favorite metrics: time and money.
Life cycle cost (LCC) are all costs associated with the acquisition and ownership of a system over its full life. The usual figure of merit is net present value (NPV). Projects are considered most favorable for large positive NPVs. However for many cost individual cases, decisions are made for the least negative NPVs. In all cases, the default position for accounting is to know the NPV for making no change and this is usually the last alternative for most people associated with change.
For inexpensive components and inexpensive tests, simultaneous tests involve many components under test loads/conditions at the same time for the purpose of quickly acquiring data and producing test analysis as the failures occur. In simultaneous testing the suspensions (censored data) become important details for use in the statistical analysis. Most simultaneous tests are accelerated to generate the data in a short period of time although this carries the risk of introducing unexpected failure modes (but this can also be useful information for anticipating field failures).
Fault tree analysis (FTA) is a top down processes of defining the top level problems and through a deductive approach using parallel and series combinations of possible malfunctions to find the root of the problem and correct it before the failure occurs. The reliability tool can be used as qualitative or quantitative methods.
A strategic job for preparing plans to reduce the failures and the cost of failures as a preventative measure to reduce the cost of unreliability. Acquires failure data and analyzes the data to quantify the financial impact and prepare long term solutions to prevent reoccurrences to improve reliability and uptime. Determines the cost advantages and proposes alternatives for solving the problem and recommends the alternative with the lowest long term cost of ownership. The purpose of these actions is to prevent failures.
Reliability growth models are important management concepts for making reliability visual with simple displays. The simple log-log plots of cumulative failures on the Y-axis against cumulative time on the X-axis often make straight lines where the slope of the trend line is highly significant for telling if failures are coming faster (b>1) which is undesirable, slower (b<1) which is desirable, or without improvement/deterioration (b=1), which usually drifts toward undesirable results. The reliability growth models are frequently called Crow-AMSSA plots in honor of Larry Crow's proof of why the charts work as described in MIL-HDBK-189 when he worked with AMSAA.
For expensive components and expensive tests, sudden death tests involve a few components that tie-up a test frame as they are heavily loaded under the same test loads/conditions with several items being run at the same time. When one of the items fails the entire test frame is shut down so that you have 1 failure (this is the sudden death!) and several suspensions because the unfailed units are survivors as the test is halted until the test frame is loaded with new samples for resumption of the life test. Opening the test frame (instead of tying up the frame until all samples have failed) is cost effective. If three units can be tested simultaneously and the test is halted on the first failure, then perhaps we will literally have only 4 failures and 8 suspensions for preparing the Weibull analysis. Will the 4 sample + 8 suspension data set be different than if all 12 samples had been run to failure?-the answer is yes, they will be different, but will they be significantly different-the answer is no to the significant difference. So, as with simultaneous testing the suspensions (censored data) become important details for use in the statistical analysis. Most sudden death tests are accelerated to generate the data in a short period of time although this carries the risk of introducing unexpected failure modes (but this can also be useful information for anticipating field failures).
Software does not wear out but it does fail and most failures are due to specification errors and code errors with only a few errors in copying or use. The only software repair is by reprogramming and adding safety factors is almost impossible. Software reliability improves by finding errors and fixing the errors but estimating the number of errors which canse failures is extremely difficult as many branches of software code may lie dormant and unused until special events occur to make the latent failures obvious. Software failures are not often time related but are more software code page dependent. Software reliability is improved by extensive testing to disclose the failures and then fixing them to repeat the test all over again to validate the fix did not generate more failures and to continue the search of other latent defects.
Configuration control is involved with the management of change by providing traceability of failures back into the design standard. If the design details are not specified, the design will not contain the requirements and thus implementation of the project will be hit or miss for achieving the desired end results beginning with the conceptual design and resulting in the operating facility.
All actions necessary, both technical and administrative, for retaining an item in or restoring it to a specified condition so it can perform a required function. The actions include servicing, repair, modification, overhaul, inspection, reclamation, and restored condition determination.
For reliability successes, loads must always be less than strengths. When loads are greater than strengths, failures occur. The issue is determining the probability of load-strength interference which is a joint probability of when loads exceed strengths. The loads should include expected conditions plus the foolishness of people to violate rules and overload equipment, plus the vagaries of Mother Nature to impose unexpected static and dynamic loads from hurricanes, tornadoes, earth quakes, wild fires, and so forth.
A measure of how well the product performance meets objectives. In short how well are the outputs actually accomplished against a standard? Capability is frequently the product of efficiency * utilization.