Mark Keneipp, Alcoa,
and Randy Heisler, Life Cycle Engineering
It is not often that one runs across an organization that is able to undertake a significant business transformation, implement the changes successfully, and then sustain the gains for seven years with no end in sight. Here is the compelling story of one organization that has succeeded-Alcoa Warrick Smelter.
The story began in 1997 when Alcoa decided to implement the Toyota Production System globally across all 250 locations. Significant progress was made, but the Alcoa Primary Metals division leaders were not seeing the results that they expected. A business unit level internal analysis showed that their assets and reliability processes were lacking stability and this was holding lean manufacturing gains hostage. Stability is a foundational element to the Toyota Production System. If one is familiar with the Toyota House, or any other house for that matter, having a solid foundation is the key to long-term sustainability. (See Figure 1)
In 2002 Vince Adorno, Vice President of Engineering for Alcoa Primary Metals, decided to form a corporate-led team to develop a business case and reliability implementation strategy. External consultants were included in this process to ensure that best practices and reasonable estimates of potential savings were incorporated into their strategy. They also looked at their own pockets of excellence and best practices that were in place in the plants.
During these sessions, Ron Moore, of the Ron Moore Group said, “You have way too much ‘maintenance’ in your reliability effort.” The Alcoa team agreed that they were focused on improvements in the maintenance organization and were missing opportunities for improvement by not considering operation’s impact on equipment reliability. Their efforts at that time were being driven by the maintenance and engineering managers at each plant site with little involvement from operations. In addition a high management turnover rate was hindering a long-term focus on reliability, and was making continual training and retraining of key leaders necessary.
Moore explained that the maintenance organization is responsible for many functions, but it has no direct control over the successful outcome of these functions. For this reason, reliability success can only be achieved via an active partnership between maintenance and operations. Moore called this concept “Reliability Based Operations” as illustrated in Figure 2.
Corporate leaders began to develop a strategy that included a model in which plant managers and operational leaders would drive and own the reliability effort. In addition, they wanted everyone to be responsible for reliability, just as they are for safety.
The ownership would be initiated through the development of a solid business case for the reliability improvement effort. This business case would be reinforced with data from existing best practice plants in the Alcoa network as well as benchmark data from other external plants.
The ground rules for the return on investment was that it would not, and should not, come from deferring maintenance. Savings were to come from the Repair and Maintenance (R&M) budget of the facilities, with an understanding that the production gains, through improved stability, would increase throughput and eliminate waste. This was estimated to be much greater than the maintenance savings.
The group estimated that they could lower Repair and Maintenance (R&M) costs by 10% to 20% over a three-year period, and predicted that for every maintenance dollar saved by eliminating defects, they would realize 1.5 to 6 times that in Overall Equipment Effectiveness (OEE) gain. The production strategy was to implement OEE, calculate the value of a one percent increase in OEE, and use this knowledge to eliminate bad actors or defects that would subsequently drive plant improvement.
Having developed the business case for improvement, the group’s next challenge was to craft a strategy to educate the organization, determine the existing gaps in reliability best practices, and implement the needed changes. A three-wave process was developed in order to create an orderly and sustainable phase-in of the program. (See Figure 3)
The first wave would focus on educating the site on what reliability excellence means, solidifying the sponsorship and creating the necessary alignment within the management team, union leadership and workforce. The second wave would assess a site’s current performance using a 29-element Reliability Excellence Model, or what Alcoa eventually called “REX.” (See Figure 4)
A master plan would then be created to close the gaps, and a business case developed to show the value of achieving reliability excellence at each site.
The third wave would entail forming a leadership team to manage the changes and focus teams to execute the tasks in the master plan that would close the gaps in best practices. Benefits were to be tracked, and audit processes were to be put in place to ensure sustainability. A pilot location was chosen to test the process. This is where Warrick came into the picture. . . .
Warrick Primary Metals
In the fall of 2003, Royce Haws, the Warrick Smelter Plant Manager was contacted by Vince Adorno, VP of Engineering and Maintenance. Adorno informed Haws that Warrick had been recommended as the pilot site for the three-wave process. Warrick’s costs were among the highest across the Primary Metals business units. The plant being shut down before it’s time was also a possibility. Although the challenges were great, the plant leaders agreed to go forward.
To lay a foundation of knowledge for what to expect, and how to manage it, Haws and his maintenance manager, Danny Reyes, began an educational journey to enhance knowledge and understanding of strategies and techniques to create excellence in manufacturing and maintenance reliability.
This introductory education was followed by a reliability best practices assessment that collected data and included interviews with managers, supervisors, crafts and operators. The plant’s score was a 441 out of a possible 1000, indicating that Warrick was in a predominately reactive mode, with most of the focus on being really good at emergency maintenance response. The opportunities for improvement were considerable, but significant culture change would be needed if the site was to achieve the business case they had developed. The plant leadership paid specific attention to the Master Plan in terms of what had to be done to close the gaps in maintenance, operations and culture, in order to be successful.
A leadership team was formed to steer what was now called the “REX” effort, including creating governing principles and measures needed to lead the organization into a proactive environment. The team developed and executed a communication plan that included “Town Hall” meetings. At these meetings site leaders communicated to the organization why they were implementing REX. They explained that everyone’s help was needed if they expected to achieve the goals and enjoy long-term job security. Both salary and union personnel were chosen to participate on the focus teams that would design the future state of the business processes, how each part of the organization needed to function and what their roles would be in this new way of operating. The central premise of this new way of thinking was that the operations side of the organization would own the reliability of the assets.
Leading the Change
The plant manager’s commitment was heard clearly at the Town Hall meetings. Haws shared that Warrick’s R&M costs were almost the highest in the Alcoa smelting system and needed to be reduced 15%-20%. “I pointed out that we could do this the stupid way or the smart way,” explains Haws. “The stupid way was to defer maintenance for a few years, avoid the consulting fee, and I could hope for a promotion before top management figured out the cost reductions were not sustainable.”
Haws understood what he called the stupid approach, because that was the plant condition he inherited when he transferred to Warrick five years earlier. He knew the smart way was to approach this new opportunity as a potential transformation for a 43-year-old facility. Warrick did have a “burning platform” or business case to drive change. Less than 5% of the world’s capacity for smelting aluminum is performed in plants that are 50+ years old and the Warrick Smelter was 43 years old. In the 1970s, there were 33 aluminum smelters operating in the USA; today there are only eight.
Due to the respect that the area managers had for Haws, their plant leader, they got on board, started asking what they could do to help, and started learning more about this new approach for maintaining plant assets. Admittedly, many were concerned this new initiative might become another program of the month.
The leadership team went to work and drafted a mission statement, plant floor communications plans, governing principles and partnership agreements. The team also developed a method and process for capturing production data that would later be used to calculate OEE and Pareto chart equipment bad actors.
Additional metrics were chosen, but most importantly, accountability for these additional metrics had to be determined. Haws decided that production managers would now be accountable for maintenance metrics like PM Compliance, Schedule Compliance and Maintenance Cost. It would now be the maintenance organization’s responsibility to support operations in achieving best practice goals. The maintenance organization would now be accountable for more leading indicators like Percent Planned Work and Schedule Efficiency, Backlog, and PdM Diagnostic versus Corrective Work, to name a few.
“Assigning accountability in this way marked the beginning of a significant culture change,” points out Mark Keneipp, who served as Warrick Smelting REX implementation facilitator. “We needed outside resources to teach us change management principles we needed to follow as we led the organization through this cultural change. As both Reliability Excellence and change management experts, Life Cycle Engineering (LCE) helped us navigate through the technical and cultural changes.”
The leadership team asked another key question: “Where do we start?” In a lengthy discussion, the team considered factors like which area of the plant would bring the biggest financial gain and where success was most likely from a cultural and leadership standpoint. The group decided to focus on the aluminum services area. The success in this pilot area would later become the model for the rest of the plant.
Re-engineering Work Processes
The focus teams began re-engineering the work management processes, material management processes and reliability engineering processes. Roles and responsibilities were defined. Newly redesigned processes and roles were presented to the leadership team and approved for training and implementation in the pilot area.
Planning and scheduling meetings were re-engineered so that the planning and coordination of jobs was done prior to the meeting by smaller groups so that the focus in the “scheduling” meeting was just that, scheduling or “when” to do the work. Production managers now were in charge of prioritizing and scheduling the work with maintenance in an advisory role. A true partnership was beginning to form.
Each week a job from the upcoming week’s schedule was chosen for review the following week in order to critique how well it went so that the group could learn from both successes and failures.
Operators were now paying close attention to how they were operating the assets and began entering requests for work into the Computerized Maintenance Management System. They also started collecting downtime data and working with Reliability Engineers to eliminate recurring failures. Changing the accountability for reliability was now creating a pull for help from Reliability Engineers so that OEE targets could be achieved.
Reliability Engineers corrected equipment hierarchies and assigned criticality codes to the assets. Simplified failure mode and effects analysis was performed on the pilot area assets by the engineers which set the stage for PM and PdM optimization.
Significant focus was put on parts and materials by both the Materials Management focus team and the planners. Obsolete parts were dispositioned and a parts kitting area was set up to kit and stage parts for planned work. A color-coded tagging system was put in place to provide a visual recognition of where parts and materials are in the process. (See Figures 5 and 6)
Active Leadership Yielded Significant Results
The REX implementation process also included a “REX Lead Team” responsible for driving the implementation of the Wave 3 Master Plan. The REX Master Plan included almost 140 action items; 20 of these were owned by the REX Lead Team. To this day, seven years later, the REX Lead Team continues to meet monthly to discuss opportunities.
Improvements in productivity and partnerships were quickly observed in the pilot area and subsequently roll-out plans were developed to implement the changes throughout the plant. Operation managers were responsible for implementation in their areas. This signaled to each area that this was not a “maintenance department initiative.”
Progress was slow but the benefits were mounting. When this effort began, maintenance costs were excessive. In 2004, Warrick realized an 11% reduction in R&M costs/MT aluminum produced and another $2.4 million/year in improvements connected to OEE gains compared to our 2003 REX base. The benefits continued in 2005, with OEE gains coming in at $4.4 million/year and annual maintenance costs dropped by 15% from our 2003 REX base.
As significant as the results were, the journey to improve reliability was not over. The Warrick Primary Metals leadership believed they now needed to optimize the improvements they had made in order to continue their progress.
Optimizing Results to Continue Progress
The year was now 2007 and the journey was not over. Even though significant financial benefits had been achieved, the plant was still far from the initial goal that was set by the plant leadership. Achieving that goal was going to require more changes. Thousands of preventive tasks still needed to be optimized, organization structures needed modifying, and the span of control for planners had to be adjusted. Continued culture change was necessary for their success.
Joe Kuhn, Smelter Maintenance Manager explains, “We had to go all in. In other words, each of the twenty-nine elements [of the Reliability Excellence Model] had to be focused on and optimized to achieve best practice.” This would require everyone to play a part. There was some temptation to “cherry pick” from the 29 elements but leadership decided to embrace all the elements. The REX Wave 3 was seen as a holistic model with the elements connected in ways that could not be separated to achieve the desired results.
Maintenance and operations personnel reviewed each PM during PM Kaizen events. Estimates were inaccurate and tasks were outdated. PMs implemented 20 plus years ago were anywhere from 4 to 16 hours long and included time to do an inspection and then fix what was found. This was mostly driven by maintenance not knowing if operations would give up the machine later to do maintenance, so it was thought best to do it all while the machine was down. Machines sat idle for hours while parts and special tools and equipment were found to do the work.
In Kuhn’s words, “We took out stupid.” Improvements were immediately visible. Repairs were no longer made during PMs. Findings were reported, then planned and scheduled. This was a huge eye-opener for the organization. Many tasks were replaced with condition-based tasks. Predictive Maintenance (PdM) was taken to the next level. In 2003, PdM work was only 1.5% of total maintenance hours. Today about 14% of the total hours is diagnostic and corrective follow-up work. Kuhn points out that there is still plenty of opportunity for improvement: “Ideal would be fifteen percent diagnostic, and thirty-five percent of corrective PdM follow-up work.”
Previously the maintenance group would try to optimize a run-to-failure approach. Kuhn shares an example: “We would have oil analysis data indicating that functional failure had begun in a $15,000 gear box but we would attempt to operate it for another six months and hopefully replace it just before it got smoking hot.” Today they have nine hourly technicians performing various PdM diagnostic duties around the smelter. The preventive and predictive maintenance program is now failure-mode based. Further, they act on the data the same day, then plan and schedule follow-up work. Some tasks were also shifted to operators. As a result of this massive effort, 55,000 man-hours were taken out of the PM program. Again, a significant number of these saved hours were shifted into PdM diagnostic efforts and follow-up corrective maintenance.
The goal was to get higher on the Potential Failure curve (PF curve). (See Figure 7)
Since the beginning of the initiative, there had been an attempt to focus the maintenance organization on the three types of maintenance: preventive, emergency and backlog relief. Emergency work had remained high, which continued to divert resources away from PM and backlog reduction work. Leadership decided to centrally locate a crew that would handle emergencies across the plant for both mechanical and electrical work, while the remaining workforce focused on executing PMs and planned backlog. The message that emergency work is bad was communicated through signs throughout the plant.
Kuhn addressed the span of control for Planners. They were now measured on the percentage of planned work that they were actually producing. Planner metrics were revisited because some former expectations were driving the wrong behavior. Previously, hitting the corporate expected numbers was more important than the quality or efficiency of the work that was put on the schedule. Leadership communicated to the organization that it was acceptable for the numbers to be lower, but accurate, so that the barriers to best practice could be removed. Keneipp, Warrick Smelting REX implementation facilitator, reflects, “Everyone’s efforts were directed toward lowering costs and increasing OEE, which were the end results we were trying to achieve. The quality of maintenance work became more important than how fast something got fixed.”
Standing work orders were driven out of the system. The more accurate asset repair data indicated that the average emergency call cost $500. Kuhn communicated this figure to operations leaders who were responsible for maintenance cost. The entire team now realized that prioritizing work properly would save significant money.
The storeroom was also a high cost area, due to inventory inaccuracy and high stock out percentages. Over the years, this condition contributed to “goody piles” around the smelter where craft folks kept spare parts to assure they would be on hand as needed. Another challenge was that 40% of the spare parts in the storeroom were “orphans,” meaning they were not associated with a Bill of Material.
To improve efficiency, vendor stocking programs were put in place with parts being delivered for planned work. The storeroom was set up to house only parts needed for emergencies. Improved reliability on the plant floor was also lowering materials costs. Motor and gearbox failures no longer occurred weekly. Today, about 50% of spare parts come directly from the vendor and spend no time in the storeroom. This is a major benefit of planning and scheduling work four to six weeks in advance. “In reality,” Kuhn points out, “a storeroom is mostly a huge countermeasure for emergency maintenance.”
Management continued to reinforce the message that reliability was critically important. The communication plan included one-on-one conversations with employees and awarding of incentives. For example, the quality of feedback on PMs had suffered over the years. Due to the fact that follow-up corrective work was rarely scheduled, craft people often didn’t fill out the PM reports. They saw it as a waste of time. To overcome the problem, crafts were recognized for detailed feedback on work orders. Operators were also recognized for accurate and detailed work requests. This type of behavior was rewarded with $25-$100 gift cards. Management put focus on getting work requests entered properly, the findings scheduled and executed, and the results communicated to employees.
Sustaining the Gains
Alcoa Warrick has encountered many bumps in the road on the long journey to reliability excellence, but significant results have been achieved in both, culturally and financially. The challenge over the last few years has been to sustain those gains. Several focus areas have helped them ensure sustainability:
Harnessing the power of Reliability Engineers: When REX was first implemented it was discovered that existing Reliability Engineers were in fact doing mostly project engineering work. Two Reliability Engineers are now assigned to focus solely on “Top Ten Bad Actors” and root cause failure analysis. Their job is to prevent recurring failures and track the benefit in dollars plant-wide. Operations managers call on them to help make problems go away.
Solid, long-term leadership: This was a key success factor. Haws, the plant manager, remained at the helm and continued to ask his managers how he could help. This kind of support allowed members of the management team to take risks like reorganizing and changing metrics expectations. Active leadership was crucial for building the partnership between maintenance and operations that continues today. As evidence, the Warrick Smelter has a direct salary work force of about 85 people and 20 of these are CMRPs or Certified Maintenance Reliability Professionals. Many of these CMRPs are in operations.
Regularly assessing progress: Frequent re-assessments were another key driver for sustainability. LCE was asked to re-assess the Warrick Smelter on an 18-month frequency. Warrick started with a score of 444. Within 18 months the score had improved to 555. Within another 18 months it improved to 603. Eighteen months later the score rose to 719, placing Warrick in the proactive range.
So where does Warrick stand now, seven years into its REX journey?
The asset health of the 50-year-old smelter is greatly improved from 2003 when REX was started. Mark Keneipp points out, “Remember all those $10,000 to $25,000 major components like motors, gear boxes, and pumps that use to fail unexpectedly at odd hours? It rarely happens now, and when it does we perform root cause analysis to reduce the chances of it happening again.” The financial results are impressive.
Maintenance costs per ton have been reduced by 38% (see Figure 8), and 2010’s current OEE gain is $5.8 million over the 2003 REX base. Progress reports are generated quarterly so that comparisons can be made with other Alcoa plants and best practices are shared globally.
The journey has been difficult, but the rewards have been many. Alcoa Warrick has eluded the threat of shutdown and become a model for other plants to follow on the path to excellence.
Mark Keneipp is the Alcoa Business Systems Manager for Warrick Primary Metals. A registered Professional Engineer and CMRP, Mark has over 32 years of experience in the aluminum industry.
As Managing Principal for Life Cycle Engineering (LCE), Randy Heisler specializes in reliability management and maintenance planning. Randy has 25 years of experience in the field. www.LCE.com.