Fault-Tolerant Control — An Interactive Primer

1

The response after the diagnosis.

Fault detection asks is something wrong? Fault-tolerant control asks the harder question that follows: now what? A driver has noticed no warning. The plane is still over the Atlantic. The line is still running. The job is not to restore perfection — the machine is, by premise, broken. The job is to keep the mission alive.

FTC is the discipline of designing controllers that continue to operate, safely and usefully, when some part of the system they are commanding has failed. “Safely and usefully” is doing a lot of work in that sentence. Post-fault performance is almost always worse than nominal. The engineering question is how much worse, for how long, and whether the degradation is graceful or catastrophic.

Two broad philosophies divide the field. A passive scheme is designed, once, to absorb a pre-specified class of faults without ever knowing one has occurred. An active scheme relies on an FDI module (see Monograph No. 1) to detect and classify a fault, then reconfigures the controller in response. The first is simpler and more conservative; the second is more powerful and more dangerous.

no tolerance · diverges passive · absorbs & degrades active · reconfigures & recovers reference

Figure 1 · InteractiveThree systems tracking the same setpoint. At midpoint a partial actuator loss is injected. The baseline diverges. The passive scheme holds position but with persistent error. The active scheme detects the fault, switches controllers, and recovers tracking.

2

Passive FTC — designed to absorb.

A passive fault-tolerant controller knows nothing and needs nothing. It is a single fixed controller, designed upfront to remain stable and reasonably performant across a set of plant behaviors — the nominal plant plus a pre-specified family of faults modeled as parametric uncertainty.

The conceptual tools are the robust control toolkit: H_∞ synthesis, μ-synthesis, sliding mode control, and Lyapunov-based robust designs. Each produces a controller that trades nominal performance for a performance floor that holds across all plants in the uncertainty set.

find K that stabilizes every plant P ∈ Pnom ∪ Pfault,1 ∪ Pfault,2 ∪ ⋯

The nominal controller (teal) is fast on the ideal plant but wildly inconsistent across the uncertainty set. The robust controller (ink) is slower on the ideal plant but every trajectory stays tightly bundled — the definition of a useful performance floor.

Figure 2 · InteractiveThe passive trade made visible. Robust design buys you tight envelopes at the cost of nominal speed. You cannot have both.

What passive buys you

SimplicityOne controller, no mode logic, no FDI dependency. Certifiable.
Zero transientNothing switches. The fault is absorbed smoothly into the existing loop.
Predictable worst caseGuarantees are formal and hold over the full uncertainty set.

What it costs you

ConservatismYou pay for the worst case on every nominal flight, every healthy day. The system is never operating at its best because it is always armored against its worst.
Limited fault coverageOnly the faults included in the uncertainty set are tolerated. A fault outside that envelope is not absorbed — it is merely not prepared for.
Can require over-designOn some plants, no fixed controller can stabilize the whole uncertainty set. Then passive FTC is not an option.

3

Active FTC — detect, then reconfigure.

Active schemes pair an FDI module with a reconfiguration logic. When the diagnostic says “actuator three has lost 50% of its authority,” the control law updates. This removes the conservatism of the passive approach at the cost of depending on an FDI module that might be wrong, late, or both.

Mode: K_nominal

Figure 3 · InteractiveThe canonical active FTC architecture. FDI watches the loop. On a positive diagnosis, the supervisor selects a new controller from the bank. Click “inject fault” to see the active controller slot shift.

3.1Controller switching

The simplest active reconfiguration: hold a library of pre-designed controllers, one per fault scenario. When FDI fires, the supervisor picks the matching one. Elegant in principle; the practical problem is switching itself (§6).

A more sophisticated version is gain scheduling with fault-awareness: controller gains are interpolated continuously as a function of estimated fault severity, avoiding discrete jumps. This works when the fault can be parameterized on a continuum — actuator effectiveness, sensor bias magnitude, parameter drift.

3.2Control allocation & redistribution

In over-actuated systems — aircraft with multiple control surfaces, quadrotors, four-wheel EVs with independent drive units — the number of physical actuators exceeds the number of virtual commands (three body-axis moments, two longitudinal/lateral forces, etc.). This redundancy is FTC’s greatest gift.

A control allocator sits between the high-level controller and the physical actuators. The high-level controller produces a virtual command — say, “apply 800 N·m yaw torque and 4000 N longitudinal force.” The allocator solves a constrained optimization to decide how to split that command across actuators. When an actuator fails, its column in the allocation matrix is removed and the optimization simply redistributes across what is left.

minu ‖u − upref‖² subject to B u = vcmd, umin ≤ u ≤ umax

All 4 motors healthy

Figure 4 · InteractiveA four-wheel EV with independent drive motors. The high-level controller commands longitudinal force and yaw moment. Click any wheel to fail its motor — the allocator redistributes across the surviving wheels to maintain the commanded motion. When too many fail, the command becomes infeasible.

3.3Model Predictive Control (MPC) with online updates

MPC is a natural home for FTC because it already solves a constrained optimization at every control cycle. When a fault changes either the model or the constraints, you simply update the optimization and keep solving. The reconfiguration is implicit in the problem statement.

past (executed) predicted (planned) constraint band

Figure 5 · InteractiveThe MPC horizon. The solid past is what actually happened; the dashed future is the controller’s rolling plan. When a fault shrinks the constraint band, the plan immediately updates to keep the predicted trajectory feasible.

3.4Adaptive & learning-based reconfiguration

Adaptive controllers change their own parameters online based on tracking error, without needing explicit FDI. The control gains become states of a second dynamical system, driven to keep the primary tracking loop closed. Classical MRAC (Model Reference Adaptive Control) and modern L₁ adaptive control live here.

Learning-based schemes are the frontier. Reinforcement learning, Gaussian-process-augmented MPC, and meta-learning policies try to infer fault effects from interaction data and update the control law accordingly. Powerful and dangerous: they must be prevented, by architecture, from exploring their way into instability.

4

The reference governor — reshape what you’re asked to do.

Sometimes the controller is fine and the plant is fine; what is no longer feasible is the reference being demanded. A fault may have reduced actuator authority, tightened a constraint, or degraded sensor bandwidth. Asking the controller to track the original aggressive reference is now a recipe for saturation or violation.

A reference governor sits between the operator and the controller. It watches the commanded reference, predicts whether the closed-loop system can follow it safely, and — if not — attenuates or reshapes the command until the predicted response stays within all constraints. When the fault clears or operating conditions improve, the governor gracefully releases the attenuation.

Fault severity 30%

operator command (aggressive step) governed reference actual output constraint

Figure 6 · InteractiveThe operator commands an aggressive step. At low fault severity, the governor lets it through. As severity rises, the governor smooths and caps the reference, keeping the actual output (ochre) inside the red constraint even as performance degrades.

5

Graceful degradation — the practical face of FTC.

Outside textbooks, FTC rarely hides a fault. It manages the consequences. The system declares a degraded mode, announces it to the operator, and reduces capability in a way that is safe, predictable, and reversible if the fault clears.

The industrial pattern is a ladder of discrete operating modes, ordered by capability. A fault promotes the system from the top rung (nominal) downward as severity rises. Each rung is designed with its own controller, its own constraint set, and its own driver feedback.

Nominal Full capability. All actuators healthy, all sensors reporting, full performance envelope.

100% torque · 100% speed · all features active · driver sees nothing

Degraded Minor fault tolerated. A redundant sensor failed, a non-critical feature disabled, a derating applied.

~80% torque · 100% speed · stability control alert on dashboard

Limp home Major fault, mission-preserving. Vehicle drives to a safe stopping point at reduced capability.

~30% torque · 50 km/h cap · warning lamp · reduced features

Safe shutdown Unsafe to continue. Controlled shutdown with maximum awareness preserved to the operator.

0% torque · steer & brake preserved · hazards on · stop-safely sequence

Fault severity 0% — healthy

Figure 7 · InteractiveDrag the severity slider. The active tier shifts as severity crosses each threshold. Production systems engineer each transition deliberately — including the hysteresis so the mode doesn’t chatter as severity fluctuates near a boundary.

6

Stability & performance during the transition.

This is the hardest theoretical problem in the field. Each controller — nominal and reconfigured — is individually stable. The controlled plant is stable under each. What is not guaranteed is that the system remains stable during the switch.

A switching event is a discontinuity in the control law. The plant state does not reset; it is whatever it was at the instant of switching, and it is now being driven by a different feedback law. Classical problems:

Bumpless transferThe control signal has two different values an instant before and after the switch. The step discontinuity in u kicks the plant. Naively, the output jolts.
Integrator windupThe outgoing controller’s integrator state holds history from the old loop. Handing that to the new controller produces a persistent transient.
Switching-induced instabilityMost insidious: two individually stable controllers can, when switched between frequently, drive the system unstable. Dwell-time conditions and multiple-Lyapunov-function theory address this.

hard switch bumpless transfer reference

Figure 8 · InteractiveTwo controllers handing off at the switch time. The hard switch (burgundy) shows a step in u and the consequent transient in the plant. The bumpless implementation (teal) pre-aligns the incoming controller’s state so the handover is invisible.

How practitioners handle it

Bumpless transfer architecturesThe offline controller runs in tracking mode against the online one. At switch time, its internal state matches and no step occurs.
Dwell-time constraintsFormal minimum time the system must remain in one mode before another switch is allowed. Prevents chattering-induced instability.
Common / multiple Lyapunov functionsThe theoretical tool for proving stability under switching. You exhibit a Lyapunov function (or family of them) that decreases at or across switches.
Soft blendingInstead of an instantaneous switch, the control signal is a time-varying convex combination of old and new, u = (1−α(t)) u_old + α(t) u_new over a short blend window.

7

References & further reading

A short curated list. The first few are the canonical references; the rest point into the topics raised above.

Foundational texts

Blanke, M., Kinnaert, M., Lunze, J., Staroswiecki, M. — Diagnosis and Fault-Tolerant Control. 3rd ed., Springer, 2016. The field’s standard reference; unifies FDI and FTC under structural analysis.
Noura, H., Theilliol, D., Ponsart, J.-C., Chamseddine, A. — Fault-tolerant Control Systems: Design and Practical Applications. Springer, 2009. Heavier on implementation and case studies.
Zhang, Y. and Jiang, J. — “Bibliographical review on reconfigurable fault-tolerant control systems.” Annual Reviews in Control, 2008. The standard survey of the active-FTC landscape.
Patton, R. J. — “Fault-tolerant control systems: the 1997 situation.” IFAC SAFEPROCESS, 1997. The foundational taxonomy that passive-vs-active descends from.

Passive / robust control

Zhou, K., Doyle, J. C., Glover, K. — Robust and Optimal Control. Prentice Hall, 1996. The H_∞ and μ-synthesis reference.
Shtessel, Y., Edwards, C., Fridman, L., Levant, A. — Sliding Mode Control and Observation. Springer, 2014. Modern treatment of sliding-mode techniques including their FTC applications.
Jiang, J. and Zhang, Y. — “Accepting performance degradation in fault-tolerant control system design.” IEEE Trans. Control Systems Technology, 2006. On the conservatism trade.

Active FTC & control allocation

Johansen, T. A. and Fossen, T. I. — “Control allocation — a survey.” Automatica, 2013. The reference survey on allocation algorithms and reconfiguration.
Härkegård, O. — Backstepping and Control Allocation with Applications to Flight Control. PhD thesis, Linköping, 2003. Classical and still highly readable treatment.
Casavola, A., Mosca, E., Papini, M. — “Predictive teleoperation of constrained dynamic systems via Internet-like channels.” IEEE TCST, 2006. MPC-based reconfiguration under constraints.
Maciejowski, J. M. — Predictive Control with Constraints. Pearson, 2002. The MPC textbook most often cited in FTC contexts.

Reference governor

Gilbert, E. G. and Kolmanovsky, I. — “Nonlinear tracking control in the presence of state and control constraints: a generalized reference governor.” Automatica, 2002. The canonical formulation.
Garone, E., Di Cairano, S., Kolmanovsky, I. — “Reference and command governors for systems with constraints: A survey on theory and applications.” Automatica, 2017. Modern survey including automotive applications.

Switched systems & bumpless transfer

Liberzon, D. — Switching in Systems and Control. Birkhäuser, 2003. The textbook on switched-system stability, dwell-time, and multiple Lyapunov functions.
Hespanha, J. P. and Morse, A. S. — “Stability of switched systems with average dwell-time.” CDC, 1999. The foundational average-dwell-time result.
Zaccarian, L. and Teel, A. R. — Modern Anti-windup Synthesis. Princeton, 2011. Bumpless transfer and windup in the same modern framework.

Applications

Edwards, C., Lombaerts, T., Smaili, H. (eds.) — Fault Tolerant Flight Control: A Benchmark Challenge. Springer, 2010. The community-standard aircraft benchmark.
Ivanov, V., Savitski, D. — “Systematization of integrated motion control of ground vehicles.” IEEE Access, 2015. Wheel-torque allocation and vehicle-level reconfiguration.
Romero, J. C., Benosman, M., Lum, K. — various works on learning-based FTC for aerospace. A current line of active research.