An expansion joint failure uncovers more than just fatigue—it opens the door to systemic change.
Sometimes, technical problems are symptoms of deeper cultural and operational issues. This article explores how a deep dive into recurring fluid catalytic cracker (FCC) expansion joint failures led to far-reaching insights, not just about metallurgy and maintenance, but about a refinery's decision-making, communications and reliability culture. It’s a case where the root cause wasn’t just in the metal, but in the mindset.
Beginning the Transformation
The journey begins at a 300 thousand barrels a day (KBD) refinery, which had embarked on a maintenance business process transformation. Some of the elements of reliability being developed and implemented included business criticality assessments, reliability-centered maintenance (RCM) equipment strategies, bad actor lists, and root cause failure analysis (RCFA).
While ranking assets within each unit, it was determined that the fluid catalytic cracker (FCC) unit was one of the most critical, so this is where the reliability program’s implementation began. The heart of the FCC is a reactor with a bed of catalyst, where air is blown through the bed by the main air blower and the incoming feed is cracked and broken into various value streams. Over time, the catalyst becomes fouled by coke and other byproducts and must be regenerated. This is accomplished by circulating it to and from the adjacent regenerator, where coke is burned off the catalyst.
On the piping between these vessels are large, complex expansion joints. In this particular refinery, these were some of the most unique expansion joints with double bellows, refractory lined, hinges, and pantograph arms. Each one weighed nearly five tons.
The main air blower and the two expansion joints were the most critical assets in the unit. Not surprisingly, they were also the top three bad actors. The prescribed activities consisted of the development of an RCM equipment strategy for these assets and an RCFA for each of the failure events.
In developing the RCM equipment strategy for the expansion joint, the reliability lead identified 41 failure modes. Most were the usual material degradation modes, such as corrosion, erosion, acid cracking, refractory failure, etc. Although not a materials / inspection expert, the reliability lead knew that most of the mitigation steps were going to involve some sort of nondestructive examination (NDE) or condition monitoring task. This was going to be fairly straightforward, and since the reliability lead’s time on site, which was thousands of miles away, was limited, the priority of these tasks was debated.
Additionally, work order and production loss accounting data showed that one of these expansion joints failed every six to nine months and it was the same fatigue cracking on the bellows failure mode every time. The process engineer and the FCC subject matter expert (SME) on the transformation team, as well as refinery personnel, confirmed the failure was the result of frequent thermal cycling of the expansion joint.
Performing the RCFA
The FCC unit was up and down often for a variety of reasons. It was not clear if this alone caused the fatigue cracking or if suboptimal interim repair efforts were part of the culprit. Either way, this was not normal. For this reason, the reliability lead temporarily shelved the development of the RCM equipment strategy and started the RCFA.
With equipment strategies, the reliability lead typically starts an RCFA with a quick brainstorming session with one or two discipline SMEs and/or key stakeholders. Once a “straw man” of an investigation is developed (and with equipment strategies and a template or two incorporated), then a larger group of experts and parties to the event can be convened to confirm and expand the analysis.
Care needs to be taken to ensure you are not taking shortcuts, skipping the steps of producing evidence for or against possible root causes, and giving the appearance of asking people to quickly “rubber-stamp” the answers you’ve already come up with. But a well-thought-out draft of the investigation can help novice participants understand the methodology of cause and effect analysis and more quickly provoke the generation of more ideas. Time is saved by more quickly familiarizing the participants with what they are being asked to do.
A typical, formal RCFA begins with a thorough statement of the incident or problem. This includes a detailed statement of the chronology, impact and consequences of the event. In the Apollo Cause and Effect method, this is the primary effect.
Just as it is a best practice to develop an operating context document for an RCM equipment strategy, with an RCFA, it is a best practice to invite a process engineer or applicable discipline SME to give an overview of the involved asset or operating system. This is especially helpful in a complex and highly integrated piece of equipment, such as the fluid catalytic cracker. This unit is the heart of many refineries and interacts with many upstream and downstream units. During the preliminary analysis, the FCC SME spent 45 minutes drawing on the whiteboard, and it is highly recommend that the facilitator takes a picture of this upon completion.
What the RCFA Revealed
This FCC had clearly been a problem for a long time and a source of frustration for the team tasked with improving the performance of not only these expansion joints, but the whole FCC unit. RCFAs can turn into a dreaded exercise when those involved sometimes capitulate and quickly declare “inattention to detail” as the root cause. But this RCFA took four hours since the FCC SME was allowed to apply his knowledge and offer an unbiased ear to grievances. So often, internal politics, personal biases, time constraints, and other conflicts of interest can cause participants to withdraw from the process or otherwise withhold ideas. For a successful RCFA, root causes should be brought up in the spirit of finding a constructive solution instead of finger-pointing.
At the end of the FCC RCFA, about 75 potential root causes were found. This was more than analyzing the cracking of the bellows, the dominant failure mode for this asset. The bigger question was: Why does the unit come up and down so often that it subjects the expansion joint to so many thermal cycles? Obviously, this opens up a big complex issue, which is often not of much help to the problem. This involved supply chain and feed slate considerations, upstream and downstream equipment failures, and even the need for new or revised operating procedures. But the transformation team was also being asked to help change the entire culture of this organization, not just “hack at the leaves” with a couple of equipment strategies.
The cause and effect method is more rigorous and time consuming, but it provides a more defensible analysis. This not only provided a comprehensive set of possible root causes for which evidence would be obtained, but it became an outline or road map for multiyear, continuous improvement for the entire FCC unit. It was up to the refinery executives to prioritize, fund and take action, but a pretty inclusive A-Z list of considerations was in hand.