This paper will discuss advantages of combining Root Cause Analysis techniques and the Navy Work Model.
Every incident investigation has three main objectives. First, we need to understand how the organization is impacted by the event. Second, we reveal the causes of the incident and communicate them clearly to management. Our third objective is to implement changes that will effectively eliminate vulnerabilities in the system. Risk reduction is also an appropriate phrase when discussing vulnerabilities in a system.
A study of the crash of Air France Concorde flight #4590 will show how using Cause Mapping can aid in the investigation and communication of the incident. When the conversation turns from investigation and communication, to the third objective of implementation, we will use the Navy Work Model with its three pillar concept of training, procedure and supervision. Use of the Navy Work Model can turn an investigation into action.
The Concorde was designed in the 1960s as a civilian transport that could fly at over Mach 2. In the mid 1970s it was put into service and was flown by British Airways and Air France. Concorde was the world’s longest serving supersonic transport and carried approximately 100 people with a crew of nine. This aircraft catered to an affluent customer base comprised mostly of business travelers.
Flight #4590 was a chartered flight. Most of the passengers were German tourist embarking on a cruise in Miami. The flight was routed from Paris to JKF Airport and onto Miami. This flight crashed during take-off in Paris resulting in 113 fatalities.
Review of the investigation evidence reveals that the Concorde rolled over piece of debris on runway (People in the aircraft industries refer to this as foreign object damage or FOD). There was a tire puncture, the tire exploded, and some of the debris impacted the aircraft, resulting in a hole in the fuel tank and there was a subsequent fire fed by leaking fuel. The total amount of time that elapsed between the aircraft rolling over the debris and the actual crash was approximately 2 minutes.
At first this incident seems easily defined. Some would say that the obvious “root cause” of this event was the piece of debris on the runway. If this were the only cause of this disaster then we could solve it by simply running a street sweeper up and down the runways. And yet, as we already know, a catastrophic incident is never attributed to one cause but many.
A good way to start untangling the various causes in any incident is to connect the event directly to the goals of an organization. People often disagree about what the causes of an incident might be but when you ask, “Were the safety goals impacted? Was anybody hurt?” the disagreement ends and the investigating team has a common starting place. In this instance, the safety goals were impacted. There were 113 fatalities, 109 in the aircraft and 4 on the ground.
In addition to the safety impact, the company’s other goals were also impacted. The reputations of both Air France and British Airways suffered and the Concorde fleet was out of service for thirteen months. This was a loss of over $600million in revenue. The aircraft was lost which cost the company approximately $125million.
Concisely conveying the impact to the Organization is a key way to quickly focus attention. Below is an example of a one-page summary that greatly assists in summarizing the event.
Once the impact to the goals of the organization is established, it is possible to create a very “high level” or broad cause map. This process consists of asking a series of “why” questions. The impact to the safety goals of Air France occurred. Why? There were 113 fatalities. Why? The Concorde crashed.
More detail, of course, can be added at this point. There were four fatalities in the hotel. Why? The aircraft struck the hotel. Why? In order for the hotel to be struck by the Concorde two things MUST be present. The hotel must be in proximity to the flight path AND the plane has to crash. Without both of those causes being present, the four fatalities do not occur. The Concorde crashed. Why? There was insufficient thrust which resulted in loss of lift. Why? Three engines were needed at full power during a certain phase of take-off AND only two engines were functioning. Both of these aspects MUST be present in order to cause the incident. If Concorde had the ability to take-off using only two engines – the crash would not have happened. If Concorde could have retracted its left side landing gear the aerodynamics would have cleaned up and perhaps there would have been more time for the pilots to respond. Why was there loss of function in two of the engines? The engines flamed out. Why? Excessive fuel choked the engines. Why? The fuel-cell ruptured.
Hybrid visual conveyance of words & images can be streamlined method of communication. The visual cause map used 34 words to describe the incident. The paragraph used 118 words to convey the same information.
We continue with the Why questions. Why did the fuel cell rupture? A tire exploded and a piece of the tire struck the fuselage.
Why did the tire explode? It ran over a piece of debris on the runway. It has been determined that the debris on the runway was a wear strip from a reverse thruster on a Continental DC-10 that had taken off earlier that day.
Some would assert that this debris strike was “the cause” of the accident and conclude that the investigation can end. In fact, there is a criminal trial pending against Continental Airlines in France which asserts that the debris from the DC-10 is the cause of the loss of Concorde. Legal trials often focus on one thing. People who are trying to find blame focus on one cause. People who solve problems recognize there are many causes and continue the investigation.
Why was there debris on the runway? Debris fell from the DC- 10 AND there was ineffective runway inspection. The crew that would normally do the runway inspection was conducting fire safety training on the other side of the airport. It is certainly ironic that safety training became one of the causes of the loss of the Concorde.
The Concorde was delayed one hour from its scheduled departure due to some concern about its reverse thrusters. Had it taken off an hour earlier, it would have already taken off before the debris fell from the DC-10. In fact, this particular Concorde was not part of the flight rotation on July 25th. The aircraft had completed a seven day overhaul and inspection and was designated the “back-up” for that day. There were problems with two other aircraft in rotation, so this particular aircraft was put into service.
The loss of the engines is interesting for another reason as well. Power was lost to two of the engines. Why? There was an engine surge. Why? The fuel was being consumed in the intake of the engine. Why? Fuel was streaming out of the ruptured fuel cell. In some photographs it appears that, at the time immediately following the debris strike there was an engine fire. The engines were not on fire. The flame front is actually well in front of the engines.
The crew returned the engines to an idle position and attempted a re-start but there was insufficient time to complete this task. It is possible that if they had left the throttles in fullpower position, the engines would have exited a surge condition and recovered.
Why did the aircraft stall? One of the reasons was that they were in an unusual take-off profile. The center of gravity of the Concorde was unusual. The Concorde carried an enormous amount of fuel (it was essentially a flying fuel tank) so a network of valves and high pressured pumps moved the fuel around to different fuel cells in order to maintain the center of gravity. The center of gravity was calculated and monitored often by the Flight Engineer and control systems. On this flight, the center of gravity was miscalculated – why?
One of the reasons the center of gravity was miscalculated was that Concorde was carrying more luggage that it usually did. Remember, this was a holiday flight chartered by a German group who planned to leave Miami for a cruise. Each passenger was packed for an extended vacation. The Captain had to make a decision. He could choose to off-load fuel in order to maintain a strictly balanced center of gravity, or he could continue with the existing flight-plan.
The original runway distance that had set up in the pre-flight plan was shortened. The reason for this is still unclear but it has been suggested that a government dignitary’s aircraft was a nearby taxi-way. Another flight plan deviation was made without detailed consideration.
If we look at another causal branch we see that there was loss of lift. Why? The excessive wing angle of attack, the nose tilted up rapidly. Why? There was a rapid change in the center of gravity. Jet fuel was rapidly leaking out of fuel cell #5.
They are not able to restart the engines. Why? The engines are damaged AND there is a fire at the intake. Where did the fire come from? In order to have a fire you must have three things: an ignition source, oxygen and fuel. We know the fuel coming out from the ruptured fuel tank but what is the ignition source? From an initial viewing of video of the incident it appears that fuel comes from the tank and the engines ignite the fuel. But jet grade fuel is formulated to withstand temperatures much higher than those that would be generated by the engines. Investigators have determined that the ignition source for this fire was the power cables on the landing gear.
Concorde’s landing gear had to be raised very quickly in order to improve the aerodynamics of the vehicle. During the raising of the landing gear, electrical sparks occurred in the power cables because of damage from the tire debris. Oxygen, fuel and an ignition source make up the fire triangle. If any one of those elements had been removed, there would have been no fire.
In order for the electrical sparking to occur the power cables must be damaged and the actuator system must be energized.
Why was the fuel present? Any time something breaks the strength of the material object could not withstand the stress applied to it. The strength of the fuel cell was insufficient to withstand the stress applied to it. When the word insufficient is used, that does not indicate that the fuel cell was not built to specifications but simply that it failed.
What was the nature of the stress applied to the fuel cell? A pressure wave traveled through the fuel cell due to the tire debris slapping the underside of the aircraft. The fuel cell had been designed to withstand a two pound mass striking it. It is estimated that the tire fragment was approximately 8 pounds. The pressure wave phenomena had never been considered prior to this incident.
The fuel cells were full. If there had been a layer of air on top of the fuel, perhaps it would have absorbed some of the shock of the strike. Why were the fuel tanks full? It was a transatlantic flight.
The left side tire exploded. Why? The stress applied to the tire exceeded the strength of the material. Why? The tire pressure was high and the tire rolled over a piece of debris.
The debris guard was ineffective because it allowed the debris to strike the fuel cell. Also, the tire that was eventually punctured was not aligned properly and was off center, even though the landing gear systems were re-aligned during the overhaul that the aircraft had just completed.
The wear strip fell from the DC-10 onto the runway. Why? The holes on the strip were drilled close together and not in proper alignment. This particular strip was a make-shift repair strip. As the holes wore through, the mechanics drilled new holes into it. Not only were the holes too close together but they were drilled too large of a diameter. Evidence suggests that rivets pulled through the holes, causing the wear strip to fall from the DC-10.
At this point, of course the investigation can continue. With each question asked, we find more causes that contributed to the failure. The Cause Map serves a dual purpose as both a useful investigative tool and an easy form of communication to management. Two of our aforementioned objectives have been met. But how do we take the next step from information and communication to action?
One of the reliability gurus (Paul Barringer) states that human actions represent 80% of the problems we face. If this is true, it stands to reason that human actions can also significantly impact 80% of the solutions. However, engineers & technical people often focus on improving/replacing the widget. The US Navy seems to have realized this paradox. They have developed a system that can ensure the highest reliability under adverse conditions (severe operating environments, high turn-over of personnel, etc.). Use of the Navy Work Model has helped the Navy attain over 5,000 operating years on nuclear power plants with no incidents.
The model is staggeringly simple. Three “pillars” at the vertices of a triangle – each corner represents a support upon which a system can stand. The pillars are; training, procedures and supervision. These are called the three “pillars” in the model. The basic idea is that if any one of the pillars is weak, the strength of the other two will make up for that deficiency.
The Navy Nuclear Work Model recognizes that all systems will have vulnerabilities. These vulnerabilities can be protected by utilizing people based resources. The three pillars of the Navy’s Work Model are: Training, Procedures, & Supervision. Every submarine designer desires for systems to be improved, replaced, or upgraded. The reality is that a trade-off consideration of cost, weight, space, & maintainability are just a few key decision factors. When vulnerabilities are discovered the risk can be mitigated via the Work Model.
If we take the causal branches of the Concorde cause map and assign each event to a pillar in the work model, it becomes clear how the investigation could lead to some immediate and common sense solutions.
Applications to Navy Work Model:
- Capitan decision to proceed with weight miscalculation
- Scheduling of Safety Drills and Runway Inspection Requirements
- Mis-alignment of landing gear wheels (insufficient training?)
- French Air Traffic control adjustments to takeoff profile of Concorde due to government dignitary
- Installation of wear-strip on DC-10 (lack of procedure for installation requirements)
- Non-FAA certified repair (minimal supervision training requirement)
- Alteration of runway length for take off profile selected
- Unforeseen event of engine/surge and recovery at low altitude
- Mis-alignment of landing gear wheels?
- Maintenance practices allowing multiple re-laminations to wheels rather than scrap. No defined conditions for scrap of wheels, no maximum re-lamination requirement
- Mis-calculation of preliminary weight of luggage for original flight profile
- Installation of wear-strip on DC-10 (work practice)
- Mis-alignment of landing gear wheels (insufficient training?)
Hardware is not always the best or most efficient solution. As Paul Barringer has observed 80% of an organization’s problems lie in the Procedures, Training, and/or Supervision. The Navy Work Model is an action oriented method to implement solutions by encompassing the solutions with the Humans in the loop.
A copy of the Excel reference materials can be sent to all interested parties.
Click here to request your copy and to visit ThinkReliability.com