You are looking at a vehicle with a complaint: "reduced range, feels sluggish in the cold." Before you plug in a scan tool, three questions are already forming in your head. How can this system fail? What does the evidence tell me about which fault is active? Which hypothesis should I chase first?
Those three questions are answered, respectively, by fault tree analysis, Bayesian network inference, and ranked hypotheses with prior probabilities. They are not rival tools — they are layers of the same cake. The fault tree gives you the structure of failure. The Bayesian network lets you push evidence through that structure. The ranked hypothesis list is how you decide what to do next, on Tuesday, with a limited budget of time and parts.
The goal of this monograph is not mathematical rigor — there are textbooks for that, listed at the end. The goal is intuition. You should leave this page able to draw all three on the back of a napkin.
A fault tree is a picture of how an undesired event can be caused by more basic events. It is read top-down. At the top sits the thing you do not want — a crash, a fire, a motor that fails to deliver torque. Below it, branching downward through logic gates, sit the combinations of lower-level failures that would produce the top event.
Ninety percent of useful fault trees use only two gates.
The fault tree is a deductive tool. You start from the thing you want to prevent and work your way down to the root causes you can actually design against, test for, or monitor in the field. When you sum up the probabilities (technically: compute the minimal cut sets), you get a number for how often the top event occurs — a number that ISO 26262 and IEC 61025 will ask you for.
Fault trees are strongest when the system is engineered, the failure modes are catalogued, and you need a defensible number — "the probability of dangerous failure per hour is less than 10−8." They are the backbone of ISO 26262 ASIL decomposition, DO-178C avionics software arguments, and IEC 61508 industrial safety cases.
They are weaker when you need to diagnose what has already gone wrong. A fault tree tells you P(top event) given component failure rates. It does not tell you, once the top event has occurred and you've observed three symptoms, which branch is the culprit. For that, you want to read the tree in the opposite direction — which is exactly what the next section is about.
A Bayesian network is a graph of variables in which each arrow says "this thing influences that thing." Attached to each node is a small table — the conditional probability table, or CPT — which answers the question: "given the state of my parents, how likely is each of my possible states?"
That is the entire structure. Its power lies not in the structure itself but in what you can do with it. Once the graph and the CPTs are specified, you can observe any subset of the variables and ask for the posterior probability of any other subset. The math that makes this go is Bayes' rule, applied repeatedly — but the conceptual move is the one that matters.
The prior probabilities — the unconditional probabilities of the causes, written P(cause) — encode what you believe before looking. The likelihood terms — P(symptom | cause) — come from physics, from field data, or from expert elicitation. Bayes combines them into the posterior: P(cause | symptom) ∝ P(symptom | cause) · P(cause).
When two independent causes can each produce the same effect, observing that effect raises the probability of both. But confirming one cause lowers the probability of the other, because the effect now has a sufficient explanation. A fault tree cannot do this. A Bayesian network does it for free.
The Bayesian network answered: "given what I have observed, how likely is each cause?" In practice you rarely care about all causes equally — you care about the order. Which suspect do I investigate first? Which fix do I attempt before I start pulling things apart? This is the job of the ranked hypothesis list.
P(Hi | E) ∝ P(E | Hi) · P(Hi)
Read it right to left. P(Hi) is your prior — how common this hypothesis is before you look, in this fleet, in this climate, at this mileage. P(E | Hi) is the likelihood — if this hypothesis were true, how well would it explain the evidence you see? Multiply them, normalize across all hypotheses so they sum to one, and sort descending. That is the ranking.
The prior matters enormously. If you forget it, you will diagnose a rare, textbook-beautiful failure mode when the actual answer is "the tire pressure is low." The humbling thing about priors is they are often the best signal you have — base rates from your warranty database are gold, and no clever model will recover what an ignored prior costs you.
A ranked list is honest about uncertainty. It does not pretend to a single answer when the evidence supports three. A mechanic given the top three hypotheses with their posterior weights makes better decisions than one given a single "most likely" with no uncertainty attached. Vehicle Health Management systems in modern fleets increasingly produce ranked posteriors rather than point decisions, precisely because they must compose cleanly with human judgment at the service bay.
The three tools are rarely used in isolation in a serious VHM program.
The fault tree is built during design, as part of the safety case. It defines the failure structure — which components, in which combinations, can produce which hazards. The probabilities on the leaves come from reliability handbooks, test data, and field returns.
That same structure can be re-read as a Bayesian network for diagnostic purposes. The topology is nearly identical; the CPTs replace the simple gate logic, and you add observable symptom nodes. Now the model can ingest evidence. Modern tools will even compile a fault tree directly into a Bayesian network, so your diagnostic model inherits the rigor of your safety analysis.
The ranked hypothesis list is the output that the service technician, the OTA update system, or the fleet operations dashboard actually consumes. It is what the Bayesian network's posterior looks like when it hits the real world — sorted, truncated, and often accompanied by suggested next actions.
The classical fault tree assumes independence of the basic events. Common-cause failures — a shared power supply, a shared temperature regime, a shared software fault — break that assumption spectacularly. Explicitly model them as additional nodes, or use extensions like dynamic fault trees. Also: your tree is only as good as your failure-mode catalog. An event you didn't think of has probability zero in your model, and non-zero in reality.
The size of a conditional probability table grows as 2n in the number of parents. A node with six binary parents needs 64 conditional probabilities — more than you can elicit reliably from a single expert. Mitigations: noisy-OR and noisy-AND parameterizations, canonical models, hierarchical structure, and learning parameters from data when you have it. Exact inference is NP-hard in general; for real networks, use variable elimination or message passing for trees, and approximate methods (loopy belief propagation, MCMC, variational) for the rest.
A prior is an answer to the question "what do I already know before looking?" That answer is almost never "nothing." Sensible priors come from: your warranty database, field returns, reliability-centered-maintenance analyses, physics of degradation, and structured expert elicitation. Uniform priors are a statement, not a default — they say you believe each hypothesis is equally plausible, which is usually false. A bad prior can dominate the posterior when evidence is weak, so it is worth the investment to get them right and to test sensitivity.
The ranker in Figure 3 assumes symptoms are conditionally independent given the hypothesis. This is almost never exactly true — but it is often close enough to produce the correct ranking, which is what you actually care about. If accuracy of the posterior values matters (not just order), move to a full Bayesian network.
A diagnostic model is a living thing. Fleets age. Operating environments shift. New failure modes emerge as components are redesigned. Post-deployment monitoring must include calibration drift (are my predicted probabilities matching observed frequencies?) and structural drift (are there hypotheses I should be adding?). A model that was good two years ago can be silently misleading today.