reliability Centered Maintenance (RCM)


All equipment, all machinery, all systems eventually fail. It is in the best interests of owners and operators to utilize a maintenance strategy which both identifies the probability of failure, the most common failure causes and provides the most cost- and time-effective solutions to prevent or rapidly remedy these failures. Fortunately for those of us in manufacturing, such a thorough approach does exist: reliability-centered maintenance (RCM).

Reliability-centered maintenance is a multi-step maintenance process which combines elements of previous existing maintenance strategies including preventative, predictive, proactive, reactive, and condition-based. It focuses on identifying which failures are most common and present the most risk in order to preserve system function without excessive cost to the system’s owner. A successful RCM analysis will identify the minimum safe maintenance required for a system and the most efficient ways to carry it out.

ProAxion is proud to be a contributor to advancements in the cost- and time-efficiency of the RCM process. We deliver the data for predictive maintenance and make the use of condition monitoring a less expensive, faster and more viable solution for a wide range of systems and equipment.

This article explains the fundamental principles behind RCM, discusses in detail the seven steps required for a successful RCM analysis, and identifies most common maintenance solutions which such an analysis provides.

History of Reliability-Centered Maintenance

Prior to the development of RCM, the most common maintenance techniques utilized were preventative in nature. Equipment and machinery were inspected, cleaned and repaired at set intervals ranging from a few weeks to a few years. Inspections became more frequent as the equipment aged, as it was commonly believed that age was the most important factor in determining risk of failure. Systems which frequently failed were addressed via redundancy – the incorporation of multiple pieces of equipment which performed the same function, so that if one failed the other would continue working.

In the 1970s, the airline industry acknowledged that traditional preventative maintenance was highly expensive in terms of both money and labor hours required. It also, unfortunately, did little to mitigate the high crash rate experienced by jets at the time.

Researchers Stan Nowlan and Howard Heap were commissioned by the United States Department of Defense to develop a maintenance strategy which would replace the increasingly ineffective preventative method. They focused on a method which would move away from the increasingly inaccurate concept that the age of a system is the only factor which determines likelihood of failure. The result was RCM, on which Nowlan and Heap published an authoritative report in 1978.

The Department of Defense was highly approving of the Nowlan-Heap report, and soon implemented RCM in the space, military, and nuclear power industries. Today, RCM is widely used across all types of industry, from airplanes to amusement park rides at Disney World.

Fundamental Principles of Reliability-Centered Maintenance

As defined in the Nowlan-Heap report, the goal of RCM is to optimize the maintenance program in order to preserve the reliability and function of a system. RCM prioritizes safety, identifying failure modes most likely to cause human or environmental risks and remedying them. While it is concerned with creating a cost-effective solution, budget concerns are always considered to come second to safety.

The RCM approach used today is:

  • Function-oriented. It focuses on preserving the function of a system rather than every piece of equipment within the system, and strives to eliminate redundancy wherever possible

  • System-focused. The health and function of an overall system is more important than the health or function of any individual component within the system. Components whose failures provide the greatest risk to the system itself are prioritized.

  • Reliability-centered. RCM does not treat failure probability as a single static number, but seeks to determine failure probability at various points in the life cycle of a system, with the goal of ultimately extending the overall life cycle.

  • A living system. An important part of RCM is the collection of data, which is then used to further optimize both the design of the system itself and future maintenance strategies. RCM acknowledges that change and growth can occur, and the most effective maintenance method may change as a system improves.

In addition, RCM differentiates itself from other maintenance strategies in that it defines failure as any unsatisfactory condition. This means that the term “failure” does not only refer to a complete shutdown of a system. A system which is under- or over-performing its desired function, or performing an unintended function, is also considered to have failed.

For example, a conveyer belt is a system which delivers goods from one location to another. Obviously, it has failed if it no longer moves at all. However, a conveyer belt which moves too slowly or too quickly has also failed. While at first glance a faster-moving conveyer belt may be seen as an improvement, excessive speeds can damage the goods being transported. If the conveyer belt begins producing a high-pitched noise or a loud rattling as it moves, this is considered to be an unintended function and is also defined as failure.

The Seven Questions of RCM

When performing a reliability-centered maintenance analysis on a system, seven questions should first be asked. These questions help understand the system and its needs, define the goals of the maintenance program, and identify a starting point in the form of the highest-risk failures.

The questions which should be asked of any system on which RCM strategies are performed are:

  1. What is the function of the system? Use declarative statements focusing on active verbs. Example, “To pump water” rather than “To be a water pump”

  2. How can the system fail? Make sure to include all forms of failure including under- and over-performance. Example, “By not pumping water” “By pumping too much water” “By pumping too little”

  3. In what ways can the system fail? Identify all failure modes which might occur within the system. Example, “Not enough pressure to pump” “Water is leaking from the pump”

  4. Where can the failures occur? For each failure mode, identify which parts or components of the system might cause it to occur. Example, “Water leaking from the pump can be caused by a crack in the pipe or a crack in the tank where water is stored”

  5. What are the consequences of failure? Is there risk of human injury or death? Will failure cause large-scale environmental or property damage? How expensive will repairing a failure be? What are the losses to downtime or productivity?

  6. How can failure be predicted or prevented? Does historical data exist about this system and its failures? Can you calculate a failure probability? Does it need regular inspection to prevent failure? How would continuous monitoring alert for or predict failure?

  7. What can be done if no proactive task exists? If there is no way to predict or prevent failure, what can be done? Is the system fundamentally flawed? Does it require redesign? Is such a redesign cost-effective?

These questions are then used to carry out the multiple steps of a thorough reliability-centered maintenance process and optimize the maintenance performed on a system.

The Reliability-Centered Maintenance Process

The steps of a successful RCM process generally follow the same order as the seven questions discussed above.

First, decide which equipment or systems will be chosen to undergo RCM analysis. A complete, rigorous RCM process, including a fully detailed failure analysis and the calculation of failure probabilities, can be expensive and timely, especially if little data on the system exists. Therefore, RCM should be applied primarily to systems which perform critical functions (productivity or quality) or which could cause injury or environmental damage if they fail.

It is also possible to carry out a streamlined RCM or intuitive RCM on less critical equipment. This is a shorter version of the standard RCM process which focuses only on addressing the highest-risk failures, rather than every failure which the system can undergo.

Second, after the system has been chosen, its function must be defined. As RCM is focused on preserving the function, it is important to understand exactly what the system does. A single system may have more than one function. For example, the function of a smoke alarm is both “to detect the presence of excessive levels of smoke” and “to warn people in the area of the smoke and possible fire.”

The third step in the process is the identification of a system’s failure modes. These are any ways in which the system can fail. As previously mentioned, an RCM analysis defines failure as any unsatisfactory condition. For example, an airplane is considered to have failed if its engine is not running but also if its air conditioning system is down.

Once all failure modes have been identified, it is best to focus on those which are most critical – the failure of an airplane’s engine is far more important than that of its air conditioning system. While a rigorous RCM approach examines every single failure mode in detail, a streamlined approach prioritizes based on safety risks and possible high repair costs.

Next, each failure mode is analyzed to understand its possible root causes – where in the system it could occur. An airplane’s engine could stop working, for example, if the turbine is damaged but also if the gas tank is empty or the fuel is contaminated. This process is referred to as a failure mode effects analysis (FMEA). While FMEA can be time- and labor-consuming, the ultimate result is a thorough understanding of the system and its components. It also prioritizes failures based on their consequences (first safety/health, then economic) so it can save time and money in the long term as a minimum safe maintenance standard is established.

Once the most critical failure modes have been identified, a maintenance tactic is selected and implemented for each. RCM usually recommends one of four solutions, which are discussed in greater detail below.

Maintenance Solutions

Upon completion of FMEA tasks, one of four maintenance tactics is selected for each system. Which tactic is best varies from system to system depending on factors such as cost and the potential consequences of failure.

The four maintenance tactics suggested by RCM are:

  1. Condition-Based / Predictive Maintenance. The condition of the system is monitored using testing, real time observation and the collection of data. Maintenance is carried out when the risk of failure becomes high. This is a commonly used tactic because of the wide variety of testing, observation and data collection methods available. Tests which may be recommended include oil analysis, ultrasonic or infrared monitoring and vibration monitoring. ProAxion® has developed an easy-in-install, cost-effective monitoring technology called Tactix™ which can safely detect changes in vibration or temperature that predicts failures far in advance.

  2. Preventative Maintenance. As discussed earlier, this strategy involves inspections, cleaning and replacement of parts at regularly scheduled intervals of time, which often decrease as the system ages. A preventative maintenance strategy runs the risk of performing too-frequent, unnecessary maintenance at high cost to the system’s owner. However, it may still be recommended for systems where condition-based monitoring is difficult or expensive or new systems on which little historical failure data exists.

  3. Redesign of System. If the risk of critical failure (causing injury, death or damage) is too high and/or unacceptable, predictive and/or preventative maintenance may not be sufficient. In these cases, a redesign of the system with the goal of decreasing or eliminating the failure risk may be recommended. Redesigns should only occur when a concrete change which actively decreases failure risk can occur, and not merely to add redundancy to the system.

  4. No Maintenance (Reactive Maintenance / Run-to-Failure). In cases where failure of the system is neither high-risk nor expensive to repair, it may be decided not to perform any proactive maintenance tasks at all. The system is allowed to run until it fails, at which time the failure can be repaired. This solution can actually be quite cost-effective for non-critical systems (for example, a cleaning system which sweeps the floor of an industrial warehouse.)

Whichever solution is recommended by RCM (other than run-to-failure), it should be implemented in a timely manner. RCM approaches are most effective if they are utilized early in the life cycle of a system, so that the greatest amount of data may be collected and understanding of the system can begin early and grow over time.

A successful RCM approach increases both the reliability and availability of the equipment while decreasing maintenance and resource costs. Systems maintained via RCM can experience up to 40% more up-time than those relying simply on preventative maintenance.  RCM is truly the most well-rounded, optimal solution for system maintenance.

Contact ProAxion today to learn more about how you can improve your uptime and decrease costs via reliability-centered maintenance.