Reliabilityweb The Ups and Downs of Reliability Engineering and CMMS Implementation at Lone Star Steel

The Ups and Downs of Reliability Engineering and CMMS Implementation at Lone Star Steel

First, a little history about Lone Star Steel. We are a 54 year old steel mill located in the piney woods of Northeast Texas. We are about 120 miles east of Dallas. The original facility was started by the United States government during World War II in an effort to geographically diversify the nation's steel making and coking coal usage from the northeastern United States.

The original facility consisted of an ore mining operation, a blast furnace and a cast iron pipe facility. With the end of World War II, the facility became a private enterprise. The early 1950's saw the addition of a 4HI Steckel rolling mill for rolling slabs into coils and the installation of two ERW pipe mills. Over the following decades, two electric arc furnaces were added along with heat treating, pipe finishing and specialty tubing facilities. Lone Star Steel was the first domestic tube and pipe producer to receive ISO 9001 certification. Even today, much of the major original equipment is still in use after 50 plus years of service. Spare parts sometimes have to be machined, as the original equipment manufacturers in many cases have ceased to be in business years ago.

From a maintenance perspective, Lone Star Steel has historically been in a reactive maintenance mode. Over the years, we became expert "fire fighters", possessing the ability to fix almost anything in rapid fashion in order to restore production. This ability to respond to emergency situations began to earn praise and recognition. All the while we should have been focusing on eliminating the occurrence of the emergencies. But firefighting became our culture.

It is what we knew how to do and do well. To do anything else would require a major culture change within the whole organization. And the more time we spent fighting fires, the less time we had to try to migrate towards increasing our preventive work. But the reality was that without this needed culture change, we would be doomed to ever decreasing production as the lack of preventive (PM) and predictive (PdM) maintenance continued to take larger and larger tolls on our equipment.

Now I don't mean to imply that we were void of preventive or predictive maintenance. We had a fairly good lubrication program in place, and we worked hard at keeping equipment adjusted, tightened and cleaned. We were good at checking for wear on mechanical and electrical components during weekly downturns if time permitted. We had a crane crew whose only job was to perform regular inspections and repairs to the cranes plant-wide. We normally took three to seven day annual outages on each department to do major repairs and modifications as well as to install and commission capital projects. But the percentage of our time and resources that went to PM and PdM tasks as opposed to recovering from failures was small.

Prior to the implementation of the CMMS and Reliability Engineering group, the structure at Lone Star Steel consisted of seven operating departments, with each having a fairly stand-alone maintenance organization. These stand-alone organizations typically consisted of a maintenance superintendent who had one or more mechanical and electrical foremen reporting to him. The day-to-day activities of the bargaining unit work force were then directed by the foremen.

Additionally, there were centralized machine shops, welding shops, carpenter shops, electric shops, instrument shops, millwright shops and fleet maintenance shops that provided service to the entire plant. The entire plant maintenance organization fell under one manager. The organization chart of this group resembled the diagram shown in Figure 1 below. Note that only three operating areas are shown instead of seven for clarity.

Fig 1

Over the years, a couple of the departments' maintenance groups dabbled in the use of some form of CMMS, but most of the departments did not use a CMMS. Usage of a CMMS system was not a structured methodology, and what data was captured, while useful, was not necessarily comprehensive. Therefore, the data did not always tell the whole story of what was being done to the equipment. In July of 2005, the first ever plant-wide CMMS at Lone Star Steel was launched and the software chosen was TabWare. To facilitate the implementation of TabWare, an outside consulting group was brought in to assist in training and in constructing the original equipment lists, hierarchies and work order system.

The actual implementation strategy was planned as follows. First, the equipment lists and hierarchies were established in TabWare for each department. This consisted of entering equipment data and descriptions, assigning each piece a unique equipment number and coding the equipment to the correct departmental cost center. Then, the correct parent/child relationships were established in the hierarchy between all associated pieces of equipment. Next, we gathered all of the existing PM documents in the departments that had them, and began writing master preventive maintenance plans into TabWare to mimic what PM tasks were being performed to date. In departments or areas where no CMMS or formal system was in use, we applied the master plans for similar pieces of equipment located elsewhere in the plant in order to ensure that we had basic PM coverage for the majority of our equipment. Initially, we had approximately 300 master plans in TabWare which were executed against thousands of pieces of equipment.

These master plans covered all frequency of PM tasks including weekly, monthly, quarterly, semi-annual and annual. Care was taken when we launched the PMs to keep the work load level.

We took into consideration the distribution of monthly, quarterly and annual tasks performed on groups of equipment so that they did not all need to be done in the same week. The initial assumption was that all of the existing preventive programs that were being morphed into TabWare master plans were valid in both content and frequency of execution until proven otherwise. In order to keep track of this assumption, all of these initial master plans were assigned numbers beginning with the letter "I" to denote interim. As these plans were reviewed for content and frequency, they became validated master plans and the "I" designation was dropped.

At the same time that the preparation for the launching of TabWare was being performed, two new groups were formed. These were a group of planners and schedulers and a reliability Engineering group. The planning and scheduling group was staffed largely by taking one maintenance foreman from each department and making that individual the planner for that department. The planners in turn reported to a Senior Planner who in turn reported to the Maintenance Manager. The Reliability Engineering group also reported directly to the Maintenance Manager. Additionally, a TabWare administrator was designated to be the gatekeeper of the new system. This person's function was to control and issue clearances to system access and function and to help generate metrics that were queried from TabWare. The organizational chart then resembled what is shown in Figure 2.

Fig 2

Additionally, to prepare for the launching of the new system, we used various public relations tools to spread the word about the upcoming changes. One tool was the publication of a brochure that was handed out at meetings held with the maintenance craft persons prior to implementation. This brochure told in plain language what the changes were going to be and why. The maintenance process was given the acronym of P.R.E.D.I.C.T.S.. The meaning of the acronym is shown in Figure 3.

Fig 3

The meetings in which these brochures were handed out allowed the craft persons to ask questions about the new system. The meetings were held in the respective departments' lunchrooms and were chaired by either a reliability engineer or the Senior Planner. The superintendent of the maintenance group attending the meeting was also present to reinforce the acceptance of the new program. In retrospect, however, we should have distributed the brochures about a week in advance of the meetings to allow time for all questions to be thought about.

So, after several months of work, we had moved from a plant that had no coordinated CMMS with planning/scheduling and no Reliability Engineering department, to a plant that had both.

That was the easy part. Now the real work began. The basic tools were in place, but the culture remained unchanged. Not every maintenance person in the organization was gung-ho about having a CMMS system that would document everything they did, how long they took to do the work and spit preventive maintenance work orders out at them like clockwork. Additionally, the CMMS could be queried to look at cost, safety, work order backlog or just about any metric you could imagine. But the realization was that the effectiveness of our maintenance program had its flaws, and not having a CMMS system in place had only kept the flaws from being quite so exposed. And since you can't know how to fix something until you know how it is broken, one advantage of the CMMS is that you begin to see how your maintenance system is broken. You begin to get a clearer picture of what happens if you are understaffed, or don't have the needed repair parts or skip a weekly downturn. As work order history was entered into TabWare over the ensuing months, the CMMS began to be a useful tool for the Reliability Engineering group in performing root cause failure analysis (RCFA) and failure modes and effects analysis (FMEA).

By comparing FMEA hypotheses against the tasks listed in the master plans for the equipment in question, iterations could begin on developing the exact best preventive and predictive maintenance tasks needed for those pieces of equipment.

The CMMS system was set up so that work orders could be generated in one of three types. First was the Emergency Work Order (EM). These work orders were entered to document the work performed on and capture the cost to repair a piece of equipment that had failed. The work order was entered either as the repairs were underway, or after the repairs were completed, with priority given to completing the repair as quickly as possible to allow production to resume.

Since these work orders were entered in response to a failure, the volume of this type of work order was a measure of how reactive the maintenance system was. The second type of work order was the Routine Work Order (RT). These were entered upon discovery of a piece of equipment that had symptoms of impending failure. These symptoms might be discovered by the operator of the equipment, a maintenance craft person performing routine inspections, or by use of one of the predictive maintenance tools such as vibration data collection, thermography or ultrasound. These work orders were entered and routed through the department's planner. The planner's function was to plan the weekly downturn workload, make sure the necessary materials were on site and coordinate with departmental maintenance leaders and contractors as needed to get each week's work orders completed. The third type of work order was the Preventive Work Order (PM).

These work orders were automatically generated by the CMMS as dictated by the master plan(s) for that equipment. The Reliability Engineering group had set up all the master plans, including the tasks to be performed and the frequency of execution and assigned these to specific equipment numbers. The CMMS then automatically generated the PMs at the required frequency. One of the major metrics that was monitored was each area's PM compliance. This measured what percent of the week's PMs had been completed on time versus what percent had been closed without time being charged to the work order or what percent had become delinquent.

As with most maintenance organizations, especially those trying to exit from being reactive in nature, there was some backlash as the CMMS began issuing preventive maintenance work orders. Prior to the implementation of the CMMS and Reliability function, each maintenance department had fixed things as they had broken or were obviously about to break, with little time for doing preventive functions. Now these same maintenance departments were faced with doing those same repairs along with the addition of weekly, monthly, quarterly, semi-annual and annual PM work orders. And to add to the frustration was the fact that the CMMS allowed anyone to easily query to find how efficiently and effectively all of this work was being done.

The initial Reliability Engineering group consisted of two staff reliability engineers and one consultant engineer who was part of the consultant group assisting in the CMMS implementation. Additionally, we had one inspector on staff that focused primarily on the collection of vibration data. The reliability engineers initially focused on helping to construct the equipment lists and hierarchies and entering PM tasks already in place.

Once the CMMS was up and running, the Reliability Engineering group focused on the following tasks:

• Review and validate the interim master plans for content and frequency

• Create new master plans for equipment that previously had no formal PM program

• Begin performing root cause failure analysis (RCFA) and failure modes and effects analysis (FMEA) on the most critical equipment

• Perform delay analysis to focus efforts on the Top 10 delay causes

We enlisted the help of Reliability Center Inc. to provide on-site RCFA and FMEA training to the reliability engineers and to a group from one of our operating departments where most of our efforts would initially be focused. The training went well and the group began performing FMEA on the most critical equipment in their department. A word of caution here: full blown RCFA and FMEA is time consuming and requires disciplined structure. There is a lot of data to gather, failed parts to quarantine and analyze, personnel to interview about the failure and a lot of meeting time required. Time was the insurmountable hurdle. We began with twice weekly meetings to flesh-out the initial fault trees. After six weeks, attendance began falling off due to production and maintenance demands. Fewer and fewer team members were given the time to keep the process going. Additionally, the consultant reliability engineer reached a point where he had to accept a new job opportunity. Simultaneously, one of our staff reliability engineers left the company. This left us with one reliability engineer and one inspector. This is the stage where it is easiest to just forsake the whole idea of predictive maintenance.

To Lone Star Steel's credit, we pursued hiring a new reliability engineer and adding an additional inspector. During the time of being understaffed, the reliability effort suffered somewhat due to lack of time and manpower to "run all the traps." But the predictive tools we were using were adding too much to the bottom line to be put on the shelf. It had been calculated that the weekly usage of our infrared camera alone was adding over $4 million in revenue annually by allowing us to detect potential failures in time to allow for planned and scheduled correction before downtime was incurred.

This entire implementation process had been begun with the blessing of and at the directive of upper management. Once all of the transition flux and upheaval began, what part did upper management play? As is often the case, other issues come up in the big picture of business that demand upper management's attention, and that sometimes means that the issues already on the table either have to wait or find a way to resolve themselves. In our case, one issue was that we were in the midst of a record year and strong markets for our products. This required management to place additional focus on business expansion and development. The level of their involvement in the CMMS/Reliability rollout was maintained though, and allowed us to get past some of the biggest hurdles. All of the smaller issues were left to us to figure out. This did two things. First, it forced the different maintenance groups to fight through the transition together, sharing successes and miseries. Second, it kept a level of autonomy in the maintenance group by cultivating ownership of the process. Yes, sometimes those in the "trenches" felt as if we had very little management support. Although a stronger upper management influence might have quelled some of the in-fighting and initial resistance to the new system, it would also have made every aspect seem more mandated and shoved down the maintenance organization's throat. This in turn would reduce the needed buy-in by all levels. Sometimes you just have to try to make people want to take their medicine.

In review of our CMMS and reliability implementation, the following points should be emphasized:

• Training: The need for training cannot be overstated. Training in RCFA/FMEA, vibration analysis, thermography, ultrasonics and the CMMS system are absolutely essential. Don't just train the employees on the use of the CMMS to the point that they can just meet the demands that the new system will place on them. Train them to the point that they can use the system to gain knowledge about their departments and proactively make improvements. This not only promotes buy-in from those who will be participating, but maximizes the payback of the implementation costs by giving participants the tools and knowledge needed to make instant contributions. In our case, we received the previously mentioned RCFA/FMEA training, vibration data collection training, infrared thermography training (Level II certification), and the two staff reliability engineers passed the SMRP certifying exam and received their Certified Maintenance and Reliability Professional (CMRP) certifications.

• Communication: Fully and thoroughly discuss with all interested parties why the implementation is being done, what the scope of the implementation will be and what the changes in the organizational structure will be before the implementation begins. Make sure they understand the benefits of the implementation and how it can make their job easier, put more profit to the bottom line, etc.. One universal truth is that each person is most interested in how their own daily tasks will be affected. We held pre-rollout meetings with all of our maintenance groups and passed out brochures that described the upcoming transition and listed the benefits that would be achieved. Most people will not ask questions in a group session for fear of looking "stupid." And sometimes people do not ask questions simply because they do not know enough yet to know what to ask!

When you think you have adequately prepared everyone to accept a major culture change, then you probably have only done about half enough.

• Management support: Management support has to be stated up front and be visible. But this does not mean that there needs to be frequent hand-holding. It simply means that everyone understands and feels comfortable that if an impasse is reached, management is open and willing to getting the issue(s) resolved. Constant intervention by management would diminish the process of employees taking ownership of the changes. But management must be committed to providing the resources necessary to make the implementation successful.

• Don't be afraid to alter course. This means that there might be a dozen paths to the desired results, and the initial path taken may not be the best. Sometimes you don't realize this until you are part way along. Learn from what you've done, and do not be afraid to take a different course if it provides a clearer and more effective way to reach your maintenance and reliability goals. No one methodology will work at every company.

The process of improvement at Lone Star Steel will be ongoing forever. Has it been easy? No. Will it ever be easy? No, but it will get better as time goes on. We began the transition in mid 2005 as a reactive based maintenance organization. At that time, the number of emergency (reactive) work orders entered into TabWare was two to three times the number of preventive (proactive) work orders entered. After 11 months of diligent work, the number of work orders entered that prevented failures and breakdowns exceeded the number of work orders entered to repair failures. This trend is shown by the chart in Figure 4. This, combined with our use of predictive tools, allowed us to officially claim being proactive in comparison to our previous culture and history. In the months since that historic occurrence, we have continued to widen the gap between proactive and reactive work order volumes. The path to this historic event is documented by the graph in Figure 4, below. Hopefully, in the near future, we will begin the next challenge: to go from being a predominantly time based preventive maintenance culture to being a heavily predictive maintenance organization.

Fig 4

By Allen Strickland,
CMRP, Reliability Engineer,
Lone Star Steel Company

From Your Site Articles

Reliabilityweb Reliability Engineer Job Description Versus Maintenance Engineer Job Description ›