Controllability — can you get there from here?
A system is controllable when any initial state can be steered to any desired final state in finite time using an admissible input. If not, some directions in state-space are forever beyond the reach of your actuators.
Consider the linear time-invariant system $\dot{x} = A x + B u$, with $x \in \mathbb{R}^n$ and $u \in \mathbb{R}^m$. The question "where can we drive $x$?" turns out to have a purely algebraic answer. The controllability matrix collects the directions we can excite, directly and through repeated application of the dynamics:
$$ \mathcal{C} = \begin{bmatrix} B & AB & A^2B & \cdots & A^{n-1}B \end{bmatrix} \in \mathbb{R}^{n \times nm}. $$
The pair $(A, B)$ is controllable if and only if $\mathrm{rank}(\mathcal{C}) = n$. The Cayley–Hamilton theorem guarantees we need no more than $n-1$ powers of $A$ — anything further is linearly dependent on what we already have.
An equivalent view — the PBH test
The Popov–Belevitch–Hautus test reformulates controllability mode-by-mode: $(A,B)$ is controllable iff $\mathrm{rank}\!\begin{bmatrix}\lambda I - A & B\end{bmatrix} = n$ for every eigenvalue $\lambda$ of $A$. A mode is uncontrollable precisely when the left eigenvector $w^\top$ satisfies $w^\top B = 0$ — the input channel cannot "see" that direction.
Planar system · $n=2$
━━ span of $B$ ━━ span of $AB$ shaded reachable set
Controllability matrix
Try the "Uncontrollable" preset. The matrix $A$ is diagonal, meaning the two state variables evolve independently, and $B$ only pokes at the first one. The second mode is structurally deaf to every input you could apply — the controllability matrix drops rank, and the reachable set collapses to a one-dimensional line.
Stabilizability — a weaker cousin
If every unstable mode is controllable, we can still stabilize the system even if some stable modes are uncontrollable. That looser condition is called stabilizability. In practice this is often all we need: we don't mind that some benign, already-stable dynamics are out of our reach.
Observability — what can you infer from what you measure?
A system is observable when the entire internal state can be reconstructed from a finite record of inputs and outputs. Otherwise, distinct initial conditions produce identical measurements, and no amount of cleverness will tell them apart.
Add an output equation: $y = C x$. Given the input $u(t)$ and the measurement $y(t)$ over a window, can we recover the state $x(t)$? The question is, remarkably, the precise mathematical dual of controllability — a correspondence first noticed by Kalman, and one of the most elegant in engineering.
$$ \mathcal{O} = \begin{bmatrix} C \\ CA \\ CA^2 \\ \vdots \\ CA^{n-1} \end{bmatrix} \in \mathbb{R}^{np \times n}. $$
The pair $(A, C)$ is observable iff $\mathrm{rank}(\mathcal{O}) = n$. The null space of $\mathcal{O}$ — the unobservable subspace — is the set of initial conditions that produce zero output for all time. Two states that differ only in the unobservable subspace are indistinguishable from any finite record of $y$.
Planar system · two trajectories, one measurement
━━ trajectory from $x_0^{(1)}$ ━━ trajectory from $x_0^{(2)} = x_0^{(1)} + v_\mathrm{unobs}$ ┅┅ $y(t)$
Observability matrix
Try "Decoupled · unobservable." $A$ is diagonal; $C$ only sees the first state. Two initial conditions that differ only in $x_2$ produce identical outputs forever — the two red and teal trajectories leave together by different paths and map to the same yellow measurement curve. No algorithm you can invent will separate them.
The observable canonical decomposition
If the system is not fully observable, a similarity transform splits the state into observable and unobservable parts. The unobservable part has no effect on $y$; it lives in a subspace that the measurement map is structurally blind to. The observable part is what a state estimator can actually reconstruct. We will visualize this decomposition next.
Partial observability & the Kalman decomposition
Real systems are rarely cleanly controllable or observable end-to-end. Kalman's structure theorem splits the state into four independent parts — a map of exactly what you can see, what you can touch, and what lives outside both.
For any linear system, there exists a similarity transformation $T$ such that the state decomposes into four orthogonal pieces, each with a distinct relationship to the input and output. This is the Kalman canonical decomposition, and it is the clearest way to see why the transfer function alone — input to output — cannot tell you everything about a system.
The four subspaces
Only the controllable & observable part appears in the transfer function $G(s) = C(sI-A)^{-1}B$. Everything else is hidden from the input–output map.
Legend & meaning
Canonical form
After the similarity transform:
Stabilizability & detectability
For practical control design we ask less: every unstable uncontrollable mode must not exist (stabilizability), and every unstable unobservable mode must not exist (detectability). These two conditions, together with a state-feedback controller and a state estimator, are exactly what the Separation Principle requires. Stabilizability + detectability is the lightest pair of conditions under which a linear system admits a stabilizing output-feedback controller. Nothing is free, but this is as close as it gets.
The Luenberger observer — or, how to guess the state correctly
Given a model of the plant and a measurement of its output, a Luenberger observer synthesizes a clean estimate of the full state. The trick is to correct the model in real time by the measurement residual, weighted by a carefully chosen gain.
Suppose we want $\hat{x} \to x$. The most natural candidate is to integrate a copy of the plant: $\dot{\hat{x}} = A\hat{x} + Bu$. This is called the open-loop observer, and it works exactly when the model is perfect and the initial condition is known — which is to say, never. Any mismatch grows with the plant's dynamics, and for unstable systems the estimation error diverges.
Luenberger's 1964 insight was to feed back the output residual $y - C\hat{x}$ through an observer gain $L$:
$$ \dot{\hat{x}} = A\hat{x} + Bu + L\,(y - C\hat{x}). $$
Defining the estimation error $e = x - \hat{x}$, a quick subtraction gives the error dynamics:
$$ \dot{e} = (A - LC)\,e. $$
The error is an autonomous linear system with dynamics $A - LC$. If $(A, C)$ is observable, we can place the eigenvalues of $A - LC$ anywhere we like — typically a few times faster than the closed-loop poles of the controller, so estimation converges before feedback "notices." Pushing the observer poles deeper into the left half-plane gives faster convergence at the cost of sensitivity to measurement noise. This tradeoff is the beating heart of every observer design.
Plant state vs. observer estimate · time response
━━ $x_1$ (true) ━━ $\hat{x}_1$ (estimate) ┅┅ $x_2$ (true) ┅┅ $\hat{x}_2$ (estimate)
━━ $\|e(t)\|$ — estimation error norm
Plant (2nd-order oscillator)
B = [[0], [1]], C = [1, 0]
Observer pole locations
Push the poles farther left (more negative) to make the error decay faster — notice the norm curve plunges to zero. Now imagine the output contains noise. Very fast observer poles mean high entries in $L$, which amplify any measurement noise into the estimate. There is no free lunch; fast estimators are loud estimators. This is exactly the tension that the Kalman filter resolves optimally.
Ackermann's formula, in one line
If the desired observer characteristic polynomial is $\alpha(s) = \prod_i (s - \lambda_i^{\mathrm{des}})$, then $L = \alpha(A)\, \mathcal{O}^{-1} e_n$, where $e_n$ is the last standard basis vector of $\mathbb{R}^n$. It is the dual of the better-known state-feedback Ackermann formula. In practice we use place() in MATLAB or scipy.signal.place_poles, which handles numerical conditioning better.
The Kalman filter — optimality under Gaussian noise
When the process and measurement noises are known (zero-mean, white, Gaussian), there is a unique observer gain $L$ that minimizes the steady-state error covariance. It is not a heuristic. It is the answer.
Luenberger's observer leaves open how to choose $L$. Make it fast and noise appears on the estimate. Make it slow and the estimate lags. The Kalman filter is the observer design that trades these two evils off in a provably optimal way, given statistical models of the disturbances.
Consider the discrete-time system with process noise $w_k \sim \mathcal{N}(0, Q)$ and measurement noise $v_k \sim \mathcal{N}(0, R)$:
$$ x_{k+1} = F x_k + G u_k + w_k, \quad y_k = H x_k + v_k. $$
The filter alternates a prediction (push the estimate through the model) and an update (correct using the measurement). The covariance $P_k$ tracks our uncertainty about $\hat{x}_k$:
The Kalman gain $K_{k+1}$ is the optimal weighting between trusting the model and trusting the measurement. When $R \ll H P H^\top$ (measurement is precise), $K \to H^{-1}$ and the update snaps to the measurement. When $R \gg H P H^\top$ (measurement is noisy), $K \to 0$ and the filter trusts the model. The filter performs this tradeoff automatically, per sample, per state.
Tracking a noisy 1-D object · position vs. time
━━ $x_1$ (true position) • $y_k$ (noisy measurement) ━━ $\hat{x}_1$ (KF estimate) ░░ $\pm 2\sigma$ confidence
━━ $K_{k,1}$ (position-gain) ━━ $K_{k,2}$ (velocity-gain) — converging toward steady-state values.
Model · constant velocity
F = [[1, Δt], [0, 1]]
H = [1, 0], Δt = 0.1 s
Process noise · $q$
Measurement noise · $r$
Notice: The filter's RMS error is typically a fraction of the raw measurement RMS — this is the filter's "free" information gain from fusing measurements over time through the model.
The steady-state / Riccati view
As $k \to \infty$, $P_k$ converges (for detectable $(F,H)$ and stabilizable $(F, Q^{1/2})$) to the unique positive-definite solution of the discrete algebraic Riccati equation:
$$ P_\infty = F P_\infty F^\top + Q - F P_\infty H^\top (H P_\infty H^\top + R)^{-1} H P_\infty F^\top. $$
The Kalman gain converges to $K_\infty = P_\infty H^\top (H P_\infty H^\top + R)^{-1}$, and the filter becomes — structurally — a Luenberger observer with an optimally chosen $L = F K_\infty$. In practice, many production filters run at steady-state gains computed offline: lower CPU, identical asymptotic performance, much simpler ASIL-relevant code.
What the filter is and is not
The Kalman filter assumes a linear model with known, Gaussian, white noise covariances. Real systems usually break at least one of these: model nonlinearity (handled by the EKF or UKF), non-Gaussian noise (handled by particle filters or robust estimators), and coloured or correlated noise (handled by state augmentation). But start here — the linear, Gaussian Kalman filter remains the beginning of every serious estimation architecture, and an astonishing number of production systems never need more.