The Critical Role of the Reliability Engineer

04 October 2010

Our manufacturing model is machining, processing and assembly, with over 1000 different work centers. Over the years we have been able to utilize redundancy of equipment to mask our downtime issues. Unlike a process plant, we have many options with this redundancy to make or assemble a specific product. As we started to implement Lean Manufacturing throughout our supply chain we made good progress in reducing our costs and the number of touch points. However, when we started to measure overall equipment effectiveness (OEE) on our machines we realized we had some major issues with machine downtime. A decision was made as part of our overall Lean initiative to create and implement a new maintenance program that would address our machine deficiencies. We researched different types of approaches to improve our maintenance programs. After some benchmarking and interviewing different consultants, we felt that the Reliability Excellence approach offered by Life Cycle Engineering (LCE) met our requirements. We felt that their methods and best practices would fit in best with our culture and our Lean journey. LCE came to Swagelok in 2007 and did an assessment of where we were in regards to maintenance - basically how reactive were we? The results of the audit were that we were on the high end of a reactive or emerging maintenance program scoring 353 out of a possible 1000.

Progress

The next order of business was the design of the maintenance business processes - Materials Management, Work Management and Reliability Engineering. A fourth team would be focused on integrating these business processes into our current programs and culture (Integration Team). Teams for each process were put together by pulling associates out of their operations group and assigning them to the appropriate team based on their experience. With the help of LCE, new business processes were designed to fit the Swagelok model, taking our culture and values into account along with the shortcomings of our CMMS software (which were many).

Implementation

An Implementation Team was formed with some members from the business process design teams inserted onto the Project Implementation team to ensure continuity. This was to make certain we had good understanding of the business processes both present state (pre-Reliability Excellence) and future state. The plan was to go through the company with an implementation at each site. The deliverables of the project being that all associates at the site had to be trained in the new business processes and were using them in their daily line of business. Progress was measured by auditing a large sample of the groups. The implementation of all the new business processes was quite a step for us. Prior to this time we had very informal processes for management of maintenance and materials. We previously had certain techs always working on the same equipment so we became good at reacting to emergencies. In regards to materials management, each site had someone who knew where most things were and we managed inventory by never running out of anything.

We then started implementation of the new business processes and paid special attention to the integration team. This group helped support the change management and communication of what was happening, going from a highly reactive approach to a goal proactive approach - one of all maintenance work being planned. The idea of no work being performed without a work order being created, was something we had tried before but it had never stuck. This time, we knew we had to make sure all the business processes were followed with no exceptions.

The organizational structure we had at that time was a program that was "owned" by a corporate continuous improvement group. The maintenance functions, including Planners and Reliability Engineers (REs) reported to production at each site that they worked in. After a couple of months, we realized that we needed a centralized maintenance organization to get the alignment and collaboration necessary to be successful. We created a maintenance group that was centrally managed but within our operations and reporting to a director level. This would enable us to make better use of resources and take better control of our MRO spend. Additionally, the implementation project manager would report to this same director, thus ensuring better communication and prioritization between the project implementation team and the line of business groups who were receiving the new business processes.

Results

During our first implementation, we had our first metrics and quickly realized the amount of downtime we had on our machines. It quickly became apparent that if we did not get to the root cause of the worst machines - or as we called them "bad actors" - we could not free up any maintenance resources to work on planned jobs.

In the first six months of 2008, seven machines were consuming 4-5 maintenance technicians' time working strictly on urgent work orders with an average monthly unplanned downtime of 17%. The MRO spend on these machines was also very high. Approximately 10% of our overall spend for the company was going to this small group of machines. This is when we realized that in order to meet our operational goals of moving from reactive to planned work and most importantly, to get the unplanned downtime down to 2% or less, we needed to address these machines' issues. To achieve this improvement we assigned one Reliability Engineer to focus on these critical machines. At other sites, we had similar issues where the most complex machines had the highest downtime and MRO spend. We continued with the other two business processes of Materials and Work Management as we realized that overall long term success required planning and managing our spare parts. But if we did not quickly get our arms around the unplanned downtime on our key equipment we would never get to the goal of 80% planned work.

The approach we took was to take the downtime information we had from our OEE measurements together with our repair information from our CMMS software. We then analyzed the data and initially saw that spindle failure was a major downtime and cost driver of some of the equipment. The REs used a disciplined problem solving approach that is commonly used at Swagelok for getting to the root cause of a problem. The result of this approach was a structured and scalable one. The same approach worked in the other sites and overall we had good solid plans for all of our "bad actors".

Action

As we looked across the organization at these "bad actors" we saw similar issues at each site including problems such a filtration of our coolants, issues with lubrication systems not working, a lack of standard work for rebuilding sub-assemblies such as spindles, and finally no real way of either predicting or measuring when machine components needed to be replaced before catastrophic failure. Overall, we did not really have a strong understanding of the way our equipment was designed to run. With the reliability approach, we have seen large reductions in MRO spend - 25% overall! Additionally, we've seen large reductions in unplanned downtime - in some areas by as much as 60%. As we continued upgrades to the equipment, we also improved the maintenance plans and our predictive approaches were standardized. The large reduction in MRO spend has more than covered the cost of the project - the business case - quickly gaining the support of senior management.

So why is the Reliability Engineering function so critical, and is it more critical than the other two business processes: Materials Management and Work Management? As I see it, if the Work Management and the Materials Management are executed at a high level this will make you more efficient, but not reduce failures.

For example, one of our machine platforms had a major problem with spindle life. We have over 200 spindles total on this equipment thus leaving us with a significant problem. If we had approached the problem without getting to the root cause - a Reliability Engineering approach - and instead just focused on managing the supply of spindles and effective planning of the repairs, we would have had a slight reduction in the downtime and not affected the spend on rebuilding the spindles ($300K per year). Using the reliability approach we got to root causes which were:

• The coolant in the machine was not filtered, so chips were prematurely wearing and/or destroying the seals on the spindles

• The spindle rebuild process was just replacing seals and bearings, not inspecting shafts and housings for wear

• We were using the incorrect type of seal for the job

• A key O-ring was not being replaced

With this knowledge the REs put together a solution which was to:

• Repair and re-engineer the maintenance of the filtration system

• Rebuild the spindle to OEM specifications

• Replace the seal with a more appropriate application (required the spindle housing to be machined)

Once we did all of this, we used vibration analysis to determine which spindles needed replaced first and we then put together a plan to repair all of the spindles. So far, we have not had a newly rebuilt spindle fail in nearly a year. Mean Time Between Failure had been approximately every 3 months. Using good materials management best practices we also now control the spare spindles more efficiently and plan the work better. But again, the spend reduction would have been insignificant without the RE efforts. Our current predictions to date indicate that in the next three years, we will spend less than last year's cost of $300K on these spindles.

Another example of where the RE process has paid off, is when we found that another group of machines were indicating 15% unplanned downtime. The operations group had lost all confidence in these machines and strongly suggested we replace this machine platform. Using the same RE approach we found the major issues were:

• Hydraulic problems aeration, overheating and hydraulic oil contamination

• Spindle turret failures

• Issues with the machining process - the process was engineered in a way that it exceeded the machine specifications

The solution was to:

• Implement a good PM program for the machine

• Put good standard work in place for turret rebuilds

• RCA for some failures was a weak key-way design

• Had the process engineers change the process to eliminate damage to the machine

This corrected the major issues and we are now achieving just 4% of unplanned maintenance downtime! Using the other good work management and materials management processes, we have established lower stocking levels for spare parts and we have stronger, well-written job plans for turret rebuilds which will enable us to sustain the gains. Without the Reliability Engineering tools we would not have reduced the downtime and spend so significantly.

By Peter Sheard
Director of Manufacturing Support
Swagelok Company
Solon, Ohio

and

Richard M. Jamison
Client Success Champion
Life Cycle Engineering
Charleston, SC