Using Fault Tree Analysis instead of Failure Mode and Effects Analysis
by Joe Ficalora on 5th February 2010
Introduction
One of the more useful tools in Lean Six Sigma problem investigations is the Failure Mode and Effects Analysis or FMEA. This tool gathers and organizes team inputs, problem detection, severity and occurrence in a useful and valuable way to track risks to input variables and output variables. It prioritizes those risks in a manner that allows the highest risks to be addressed first. When combined with other Lean Sigma tools like the Cause and Effect matrix, it helps the team focus on the input variables most important to the key output variables in a process being studied.
However, sometimes in a Lean Sigma project, a key failure mode may have such high risk and/or may be so complex that it needs further investigation to prevent it from ever reaching a customer. This happens in Lean Sigma applied to product and services design as well as in
process improvement. What often happens is that a separate FMEA is begun to “eliminate” this key failure mode completely. This is not always easy or entirely effective for a complex product or complex service involving software, different business functions, multiple branches, multiple subsystems and hundreds of different paths. The tedium can often overcome teams and diminish their efforts over time, plus the most critical paths to prevent are non-obvious.
Consider the examples of nuclear power plant faults, medical delivery systems, pharmaceutical prescriptions, or even air traffic control to name a few complex products and services. Some failure modes will arise that require 2, 3 or more contributing causes to be present in one form or another. The number of potential failure-cause combinations can be in the hundreds or even thousands. While the primary simple failure mode causes can be identified, analyzed and reduced in occurrence and severity, any complex failure combinations involving 4, 5 or even 10 contributing causes are not easy to find by manual inspection. If human life is at stake, or other severe consequences, a more thorough and compelling analysis is really needed.
Fault Tree Analysis (FTA) is a technique that combines probabilities, fault logic, hierarchical structures, Monte-Carlo simulation and graphical displays to provide a more nearly complete analysis than Failure Modes and Effects Analysis alone. This is usually done in software, because of the combinatorial methods and simulations.
Unlike FMEA, which treats each cause as separate and ranked against all other causes, FTA looks at the hierarchy in the system. The system is defined in its branches by logical AND, OR, and other logical combinations. By including combinatorial logic of the failure causes, e.g. Cause1 AND Cause 2, Cause 3 OR Cause 4 OR Cause 5, better analysis and priorities may be determined.
To detail an example, we can see that if three causes are needed to create an event, say a fire, you could illustrate the logic. Here we could say Event A was a fuel spill, Event B was the existence of Oxygen, and Event C was a point with a temperature at or above the ignition temperature of the fuel. All three need to be present in order to have a fire.
In the simple example just discussed, an FMEA and an FTA would focus on the most probable occurring event since all three are required to create the fire for this simple example.
Some failure modes can happen when any of several possible inputs are present. The logical analog in this case is an OR gate to join these inputs in a Fault Tree. In an OR Gate between two inputs the logic works as this simple table:
| Input 1 | Input 2 | Gate Output |
| 0 | 0 | 0 |
| 1 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 1 | 1 |
Often the FMEA approach with a complex system is to fully load the number of controls 1:1 with each cause, which can be both expensive and even miss the emphasis on the most crucial paths to failure. Consider a product or service FMEA with 17 causes. In the traditional Lean Sigma FMEA, all 17 causes would be worked by the team to prevent or detect all 17 potential causes that have high probability. Without the logic included in an FTA, the team’s priority would be focused by the Risk Priority Number (RPN) of individual causes ranked against the top
system failure mode.

















