Reliability Engineering Snapshot TM

Illustrated Case Studies in the Maintenance Reliability Engineering World of Failure Analysis, Predictive Maintenance, and Non Destructive Evaluation





Root-cause-failure-analysis, or RCFA, is a multi-discipline skill that people, both technical and non-technical, use to find the cause of a failure. When a component or device fails, it leaves evidence pertaining to the way in which it failed. RCFA is the methodology used to find and extricate that evidence. Simply put, the root cause of a failure is analogous to the first domino in a long line of dominoes. It is the action or condition that initiates a series of subsequent events and/or failures that eventually prevents a component or device from performing its intended function. However, something does not have to break physically in order for it to be "a failure." That first domino can be something as simple as someone not following a correct procedure. Finding the root cause of a failure means that people can address and resolve the issues, thereby eliminating or significantly reducing the chances of a reoccurrence. Reducing the frequency of a reoccurring failure, or eliminating it all together, means minimizing adverse impacts to production and profitability. RCFA is a cornerstone of Maintenance Reliability Engineering because it answers the "why" part of the question behind a failure. Nobody will argue the extreme importance that a preventive maintenance (PM) program provides. It is unconditionally the cornerstone of a good maintenance program. However, many PM inspections and procedures base the inspection solely upon knowing what is going to fail and when it is going to fail. The PM never answers why it is going to fail. In many cases, discovering why something is going to fail eliminates the PM or significantly increases the PM inspection interval.

The easiest way to gage the performance of an RCFA program is by measuring the operating time of a piece of equipment between failures. If changing the design of a component or an operating procedure through an RCFA recommendation does not increase the time between reoccurring failures, then the root cause of the problem was never found, and it still exists.

BUSINESS PRESSURES AND RELIABILITY - Reliability of rotating and stationary equipment is a never ending relentless battle that does not start and end with one good idea. Instead, it is a continuous on-going state of ever improving solutions and an ever diligent belief in staying the course. With tightening profit margins, and commodity-like markets, it is not acceptable anymore to live with what is considered to be "routine" equipment failures. By eliminating or reducing the number of "routine" failures, maintenance reliability can mean using capital for improvements instead of "in-kind" replacements and repairs. Capital improvements which target the results of failure analysis can mean a lower cost to produce, and a higher rate of return on investment. Instead of patting ourselves on the back for making the same repairs quicker and safer, we should be asking ourselves "Why do we continue to make the same repairs?" All of these noble ideas begin with the first and very simple question - "Why did it fail?"

MAINTENANCE BENEFITS RCFA allows maintenance personnel to plan surgical style maintenance repairs instead of replacing everything. It defines the correct repair for a given set of circumstances. RCFA has proven repeatedly that finding the root cause of a failure eliminates the problem, and the component's performance is improved; it is certainly an integral part of the redesign process. In the majority of cases, knowing the root cause of the failure allows the maintenance engineer to install a better engineered solution. Knowing the root cause also allows one to predict service life based upon past performance, and hence, pick the most cost-effective solution. At least 50% of RCFA recommendations become capital projects because they clearly improve performance as opposed to simply maintain performance. Sometimes RCFA will find deficiencies in maintenance procedures; consequently, those procedures are redefined.

PRODUCTION BENEFITS - RCFA is used to improve equipment availability. Equipment downtime caused by failures reduces equipment availability. The more time the equipment is down, the less available it is to meet production budgets. If equipment is not running, it is not making money for the company. It is common to involve production personnel in the redesign phase of the RCFA. This helps managers with meeting strategic goals when planning short term and long term capital projects versus marketing forecasts.

PURCHASING BENEFITS - RCFA provides information that is valuable to Purchasing Department personnel when specifying component performance and quality of workmanship. Getting the best price for a component or piece of equipment does not always equate with getting the desired performance. Purchasing agents can use information from a failure analysis when writing bid specifications. Not only will they obtain the best price, but also, the hardware will be able to meet the demands of the service.

INSTITUTING AN RCFA PROGRAM - The best people to use are degreed engineers. The root cause of at least 80% of all problems stem from infractions of some engineering principle. The best way to use maintenance engineers is to make RCFA their primary job function. Unfortunately, RCFA is not part of most colleges' course curriculum. Intensive training is required. There are numerous training courses available by some of the most knowledgeable people that are in the business of failure analysis. You can find them on the Internet.


American Society of Metals Presentation May 2000

Failure Prevention through Education: Getting to the Root Cause

"Challenges in Comprehensive Failure Analysis on a Complex System"

ASM Presentation May 2000 - Gremlin Trademark in view ASM Presentation - May 2000
If you're observant you'll see my trademark gremlins in the photo to the left. I was wondering how I was going to open my presentation and introduce these characters. I was the last presenter of the first day. Luckily one of the presenters before me had remarked about how one of the components that he had been doing a failure analysis on had been mysteriously breaking "... as if someone was going around and breaking these things off in the middle of the night ... " Of course I had to tell him who that "someone" was, namely the gremlins. The audience appreciated this lighter side of failure analysis because the comment got a good laugh. You had to be there to appreciate it. Approximately 75 people attended the conference.