by H. Paul Barringer
Reliability of people, processes/procedures and equipment terminates with a failure. Want higher reliability? Get rid of your failures. But that’s easier said than done. There are many buckets to hold the different types of failures. Each bucket has a different name for the root of the failures. Frequently, the failure buckets are collected into three major categories for simplification: 1) People, 2) Processes + Procedures and 3) Equipment. You must know where to attack problems to resolve the issues for a failure-free environment. Failures don’t correct themselves!
The aircraft crashed after temporary inconsistencies between the airspeed measurements—likely due to the aircraft’s Pitot tubes (shown on the right)being obstructed by ice crystals—caused the autopilot to disconnect, after which the crew reacted incorrectly and ultimately led the aircraft to an aerodynamic stall from which they did not recover.
For nuclear reactor systems, you are required to confess your “sins” related to the failures. No one gets fired for confessions of failures; however, termination occurs by hiding failure details. Here’s the categorization of failures for mature nuclear power production failures that has been constant for many years:
|• Procedures + Processes||→||34%|
For boiler and pressure vessel failures, ASME’s National Board published failure statistics1 for a 10-year interval highlighting ASME’s boiler test code equipment. The statistics show:
|• 23,338 accidents||→||83% human oversight or lack of knowledge|
|• 720 injuries||→||69% human oversight or lack of knowledge|
|• 127 deaths||→||60% human oversight or lack of knowledge|
Air France Flight 447 Crashes
The crash occured from Rio de Janeiro to Paris offshore near Brazil on June 1, 2009, following a flight through a thunderstorm at an altitude of ~30,000 feet under autopilot control, which breaks a practical flight commandment: “Thou shall not fly through strong thunderstorms unless the enemy is on your tail and both you and your aircraft can sustain ±9 g’s of loading.” Inside the thunderstorm, the Pitot tubes, which sense airspeed, froze from the storm’s moisture intensity. With apparent stall conditions sensed by the airspeed instrumentation, the autopilot disconnected, putting aircraft control directly in the hands of the pilots of the Airbus A330.
Pilot control of the Airbus is by joystick, similar to those used with a video game. When the autopilot disconnected, the Airbus rolled right and the pilot responded by pushing the joystick to the left but pulled the nose of the aircraft up, breaking another practical flight commandment: “Thou shall push the aircraft nose downward to gain airspeed in a stall condition.” The second pilot pushed his joystick downward as endlessly taught to every new pilot. Another near flight commandment is: “When everything is screwed up and nothing makes sense, try taking your hands and feet off all controls and let the airplane straighten itself out for ±30 seconds.”
On the Airbus, there is no tactical connection between the joysticks, so the second pilot had no knowledge of the first pilot’s fatal and amateurish error of pulling the aircraft nose up until Flight 447 reached an altitude of 38,000 feet. The stalled Airbus lost lift at 38,000 feet. In 3 minutes and 30 seconds, the aircraft pancaked into the sea, resulting in the loss of life of 216 passengers and 12 crew members. The Boeing 777 and 787 aircrafts, as with the Airbus 330, are fly-by-wire aircrafts, however the Boeing aircrafts have tactile sensors so each pilot knows what the other pilot is doing with a traditional wheel configuration.
The flight data recorder or “black box” was recovered on May 1, 2011 and the contents downloaded for study by an international team of safety experts. The Bureau d’Enquêtes et d’Analyses pour la Sécurité de l’Aviation Civile (BEA) released the final report on July 5, 2012, with the accident resulting from the following events:
- Temporary inconsistency between airspeed measurements, likely following the obstruction of the Pitot probes by ice crystals that, in particular, caused the autopilot disconnection and the reconfiguration to alternate law [special mandatory operating rules for the Airbus].
- Inappropriate control inputs that destabilized the flight path.
- The lack of any link by the crew between the loss of indicated speeds called out and the appropriate procedure.
- The late identification by the pilot not flying of the deviation from the flight path and the insufficient correction applied by the pilot flying.
- The crew not identifying the approach to stall, their lack of immediate response and the exit from the flight envelope.
- The crew’s failure to diagnose the stall situation and consequently a lack of inputs that would have made it possible to recover from it.
In addition, statistics2 from the Federal Aviation Administration (FAA) are available for a 10-year time interval for many different classes of aircraft and their operation. However, the conclusions are not so obvious because of the “slicing and dicing” of the data by aircraft category. The FAA model for system safety methodology is similar to five disks on a common shaft spinning at different speeds and each disk has a hole in it at the same size and radial distance from the axis. When all five holes line up, an accident or incident occurs. Of course, the objective is preventing the failures from occurring.
- The first disk in the model is the underlying cause framed by management’s actions or inactions that introduce latent errors into the organizational system in areas of planning, organizing, directing, controlling and staffing. Basically, the environment is what is wrong.
- The second disk is basic cause where latent system reaction errors are reacted to both inappropriately and appropriately. The second disk and the first disk become preconditions for an accident. An example would be lack of enforcement for breaches of policy or regulation.
- The third disk is the immediate cause of an accident. Individuals commit active errors by just doing their jobs or mechanical systems can break. An example is lifting loads that are too heavy.
- The fourth disk is safety defenses. The organization oversight and safety programs are the intervention countermeasure or filters that defend against errors. Examples are crew rest policies, stabilized approach criteria, “sterile” cockpits below 10,000 feet and checklists.
- The final disk involves consequences. If all defenses work, the result is no accidents. Accidents occur with catastrophic failures. Incidents are minor failures or recorded close calls.
- For large turbofan/turbojet/turboprop aircraft, the top five categories are: 1. Controlled flight into terrain; 2. Loss of control in flight; 3. Acts of aggression (sabotage, hijacking, war acts, military acts, etc.); 4. Takeoff procedures; and 5. Unknown reasons.
- For helicopters, the top five categories for failure are: 1. Lack of avoidance of object contact; 2. Improper flight control; 3. Collisions with ground/water; 4. Low rotor RPM; and 5. Engine/turbine failures.
- For small, fixed gear, aircraft, the top five categories are: 1. Visual flight rules problems with clouds, low visibility and night flight; 2. Stalls; 3. Judgment and low level operation contact with object; 4. Recklessness and acrobatics flight; and 5. Stalls involving reckless low altitude operations.
This data illustrates the need for strongly improving human performance among professional and private pilots for all three flight categories. If you think working only on the hardware will reduce your failures, you’re riding the wrong horse in the race to success! Improvement opportunities and usually money saving opportunities are with the people and processes + procedures. Attacking the correct root cause of the problem says you have better opportunities by working with people and procedures + processes; engineers find this effort very difficult because they mainly want to work on things!
Consider this old, simple and well-known failure problem from aviation. The cockpit of airplanes, beyond simple trainers, is filled with instruments and switches. Many of the switches are simple toggle switches with up or down positions indicating their intended actions. Flap switches are down for landing, just as landing gear switches are down for landing. Flaps increase the curvature of wings for increased lift, but suffer much drag during landing so their stowed position is up for retraction of the delicate flap mechanisms. In the ‘40s and ‘50s, there were many mix-ups from busy pilots performing the necessary multitasking flight efforts. The calamity of landing gear switches moved from the down position to the up position while on the runway, rather than the intended flap switch retraction resulting in the destruction of the flap system, damage to the engine, fuselage and propeller, high repair costs and embarrassing events for the pilots. No one can deny an aircraft on its belly is a failure!
How would you classify the major category for landing gear failure and how would you resolve the issue? Today, you rarely hear of pilots withdrawing landing gear on the runways. The solution was simple: Smooth toggle switch arms of landing gear switches were changed to a sharp edge as a tactile warning to the pilot that the switch in their fingers had big consequences and was not the correct switch to energize for retraction of flaps. The solution to the root of the problem rests with the design engineers and not the busy pilots without time for making thoughtful, contemplate your navel type of conference decisions, in a fast moving, real-time, multitasking environment. Oh, by the way, you do hear of landing gear never being extended with the consequential belly landing because the pilots did not adequately run their landing checklist. Of course, you use written checklists and the organizational requirement to follow written checklists for your operators and maintenance folks to prevent failures— right?
- “2001 Incident Report.” The National Board Bulletin, Summer 2002, pg. 3, http://nationalboard.org/sitedocuments/bulletins/su02.pdf
- Safety Analysis Team, Report No. SAT-01.1. http://www.faa.gov/aircraft/air_cert/design_approvals/engine_prop/media/SAT_Report.pdf
Paul Barringer is a reliability, manufacturing and engineering consultant. His worldwide consulting practice involves reliability consulting and training with a variety of discrete and continuous process manufacturing companies and service industries. Barringer has more than fifty years of engineering and manufacturing experience in design, production, quality, maintenance and reliability of technical products. His experience includes both technical and bottom-line aspects of operating a business with an understanding of how reliable products and processes contribute to financial business success.