Koopman Theory — A Linear Lens for Nonlinear Worlds

ithe big idea

Change what you look at, and nonlinearity disappears.

A dictionary of observables turns crooked trajectories into straight ones — at the price of living in a bigger space.

Suppose you have a dynamical system evolving in discrete time: $x_{k+1} = f(x_k)$, where $f$ is some messy nonlinear map. The state $x \in \mathbb{R}^n$ follows a curved, unpredictable path. Classical control chases $x$ directly, and pays a steep price when $f$ becomes stiff, multi-modal, or ill-conditioned.

Koopman inverted the question. Rather than tracking the state, track functions of the state — called observables. An observable is any scalar quantity you can measure from $x$: its energy $\|x\|^2$, its components $x_1, x_2$, some nonlinear combination $\sin(x_1) x_2$, anything. The key fact: the evolution of observables is linear, even when the state is not.

State-space view

$x_{k+1} = f(x_k)$

Finite-dimensional, nonlinear, geometric. Tools: Lyapunov functions, feedback linearization, nonlinear MPC. Expensive and often intractable.

Observable view (Koopman)

$g_{k+1} = \mathcal{K}\, g_k$

Infinite-dimensional, but perfectly linear. Tools: spectral analysis, eigen-decomposition, linear MPC, LQR. Approximations are finite and practical.

The catch — and it is a real one — is that this linear operator $\mathcal{K}$ acts on an infinite-dimensional space of functions. To make the idea useful we need finite approximations: pick a good enough dictionary of observables, project the operator down, and pray the projection captures the dynamics you care about. Much of modern Koopman work is about that single engineering choice.

Linearity is not a property of dynamics, but of the space in which you choose to describe them. — the Koopman perspective, in one line

iithe operator

The Koopman operator, formally.

A deceptively simple definition with surprising consequences.

Let $f: \mathcal{M} \to \mathcal{M}$ be a (possibly nonlinear) discrete-time dynamics map on some state space $\mathcal{M}$. Let $g: \mathcal{M} \to \mathbb{C}$ be an observable — a scalar-valued function of the state. The Koopman operator $\mathcal{K}$ is defined by how it acts on $g$:

Definition · Koopman operator $$ (\mathcal{K} g)(x) \;\triangleq\; g(f(x)) $$

Apply $\mathcal{K}$ to the observable $g$: the result is a new observable whose value at $x$ is what $g$ will be one step later. "Push the dynamics into the function."

That's it. The entire theory hangs off that one line. Two consequences deserve immediate attention:

Linearity is automatic. For any observables $g, h$ and scalars $\alpha, \beta$: $\mathcal{K}(\alpha g + \beta h) = \alpha g \circ f + \beta h \circ f = \alpha (\mathcal{K} g) + \beta (\mathcal{K} h)$. The underlying $f$ can be as vicious as you like; $\mathcal{K}$ doesn't care.
Infinite-dimensional, usually. The space of all observables on $\mathcal{M}$ is an infinite-dimensional function space (typically $L^2(\mathcal{M})$). So we've swapped a finite-dim nonlinear problem for an infinite-dim linear one — a classic functional-analytic trade.

Continuous-time version $$ \frac{d}{dt} g(x(t)) \;=\; (\mathcal{L} g)(x(t)), \qquad \mathcal{L} g = \nabla g \cdot F $$

For continuous systems $\dot{x} = F(x)$, the Koopman generator $\mathcal{L}$ plays the role of $\mathcal{K}$. Both tell the same story in different time flavours.

Eigenfunctions: the secret ingredient

A Koopman eigenfunction is an observable $\varphi$ with a very special property: $\mathcal{K}\varphi = \lambda \varphi$. Along any trajectory, $\varphi$ evolves by pure scalar multiplication — its geometry is trivial. If you can find enough eigenfunctions to span your state of interest, you have a full change of coordinates that diagonalises the dynamics. This is the nonlinear analogue of finding the modes of a linear system.

A useful intuition.

Koopman eigenfunctions generalise the notion of a "normal mode" from linear vibration theory to arbitrary nonlinear systems. They aren't usually easy to compute, but when you can find them, the system unfolds.

iiilifting

A worked example you can see.

Three observables are enough to turn a specific nonlinear system into a 3×3 linear matrix. Exactly.

Consider the two-dimensional nonlinear system (a classic pedagogical example, due to Brunton):

System · nonlinear in state $$ \dot{x}_1 = \mu\, x_1, \qquad \dot{x}_2 = \lambda\,(x_2 - x_1^2) $$

A slow stable manifold ($x_2 = x_1^2$) with fast decay toward it. Nonlinear because of the $x_1^2$ coupling.

Define the lifted state $z = \big[x_1,\ x_2,\ x_1^2\big]^\top \in \mathbb{R}^3$. Differentiate and substitute:

Same system · linear in observables $$ \dot{z} = \underbrace{\begin{bmatrix} \mu & 0 & 0 \\ 0 & \lambda & -\lambda \\ 0 & 0 & 2\mu \end{bmatrix}}_{A_{\text{Koopman}}} z $$

The third coordinate $x_1^2$ is exactly the "missing" observable that closes the system. With it, a 3×3 constant matrix reproduces the dynamics perfectly — forever, for any initial condition. No linearisation error, because there was no linearisation: we changed spaces.

Below, watch a trajectory run in the original 2D state space (left) and in the lifted 3D observable space (right). The left portrait is curved; the right one is a clean linear flow on a plane.

Fig. 2 · State space ⟶ observable space

interactive

nonlinear trajectory in $(x_1, x_2)$ same trajectory in $(x_1, x_2, x_1^2)$

μ -0.05

λ -1.00

x₁(0) 1.00

x₂(0) 2.00

The curved line on the left and the straight-in-a-tilted-plane line on the right describe the same physical trajectory. The only difference is the coordinate chart. This is the Koopman promise in miniature. The bad news: finding the right observables analytically is rarely possible. The good news: you can often learn them from data.

ivdata-driven koopman

Dynamic Mode Decomposition.

The practical algorithm that made Koopman famous, reduced to least squares.

Dynamic Mode Decomposition (DMD), introduced by Schmid in 2010 and later tied to Koopman theory by Rowley & Mezić, is the workhorse. Strip away the mystique and DMD is just this: given snapshots of a system, find the best linear operator that maps each snapshot to the next one.

The basic DMD recipe

Collect snapshots. Arrange $m$ consecutive state measurements as columns: $X = [x_0, x_1, \dots, x_{m-1}]$ and $X' = [x_1, x_2, \dots, x_m]$.
Solve the least-squares problem. Find $A$ minimising $\|X' - A X\|_F$. The closed-form answer is $A = X'\, X^{+}$, where $X^+$ is the pseudoinverse.
Extract modes. Eigendecompose $A = W \Lambda W^{-1}$. Each eigenvector of $A$ is a DMD mode; the eigenvalue gives its frequency and growth/decay rate.
Predict. Future states are a linear combination of modes, each evolving as $\lambda_i^k$. You have a forecast.

DMD applied to raw state data only captures the part of the Koopman operator that happens to be linear in $x$. For truly nonlinear systems you need Extended DMD (EDMD): lift each snapshot through a dictionary of observables first, then run DMD in that lifted space.

EDMD · in one equation $$ K \;=\; \underbrace{\Psi(X')\Psi(X)^{+}}_{\text{linear regression in dictionary space}} $$

Here $\Psi(x) = [\psi_1(x), \dots, \psi_N(x)]^\top$ is the dictionary — polynomials, RBFs, trigonometric terms, or a neural network. The matrix $K$ is the finite-rank approximation of the Koopman operator on the span of the dictionary.

A minimal Python implementation

This is the entire EDMD algorithm. No library required beyond NumPy:

import numpy as np

def dictionary(x):
    # polynomial observables up to order 2, including constant
    return np.vstack([np.ones_like(x[0]),
                      x[0], x[1],
                      x[0]**2, x[0]*x[1], x[1]**2])

def edmd(X, X_next):
    Psi  = dictionary(X)         # N_dict × m
    PsiP = dictionary(X_next)    # N_dict × m
    K    = PsiP @ np.linalg.pinv(Psi)   # N_dict × N_dict
    return K

def predict(K, x0, steps):
    z = dictionary(x0.reshape(2,1))
    traj = [x0.copy()]
    for _ in range(steps):
        z = K @ z
        traj.append(np.array([z[1,0], z[2,0]]))  # recover x from dict
    return np.array(traj)

Below: Brunton's slow-manifold system — a 2D nonlinear system with an exact finite-dimensional Koopman embedding on $\{1, x, y, x^2\}$. With the right four terms, the linear prediction (dashed) and the true nonlinear flow (solid) overlap almost perfectly. Drop $x^2$ from the dictionary and watch prediction quality collapse. The dictionary choice is everything.

Fig. 3 · EDMD prediction vs. the truth

interactive

ground truth (nonlinear simulation) koopman prediction

Dictionary terms (click to toggle)

dict size

—

RMSE

—

horizon

300 steps

Click new training batch to resample initial conditions & refit. With an exact dictionary, the fit stays perfect. With a poor dictionary, watch how RMSE changes.

The dictionary problem.

If your dictionary is too small, EDMD fails. If it's too large, it overfits and becomes numerically unstable. Modern Koopman research ties a neural network autoencoder to the DMD step, letting gradient descent discover a dictionary that is automatically invariant under the operator. This is often called a deep Koopman or Koopman autoencoder.

vkoopman mpc

Turning nonlinear MPC into a quadratic program.

Lift, then solve a convex problem. Predictions become matrix powers; control becomes a QP.

Model Predictive Control works beautifully when the plant is linear: the optimisation over a horizon reduces to a convex Quadratic Program (QP), solvable reliably in a few milliseconds. For nonlinear plants you face an NLP — slower, non-convex, and dependent on a good initial guess. Koopman offers a tidy escape:

Collect trajectory data from the plant under excitation (system ID).
Fit a Koopman model with control (Koopman-with-inputs, or KIC): $z_{k+1} = A z_k + B u_k$, where $z = \Psi(x)$.
Solve linear MPC on $(A,B)$ over the lifted state. Standard QP machinery — OSQP, qpOASES, HPIPM — applies directly.
Apply the first input, measure, re-lift, repeat.

Koopman-with-inputs $$ \Psi(x_{k+1}) \;\approx\; A\, \Psi(x_k) \;+\; B\, u_k $$

The lifted state evolves linearly in $\Psi$ and linearly in $u$. Fit $A$ and $B$ jointly via a single least-squares problem on stacked data.

Where it shines (and where it doesn't)

Strong fit

Fast, mildly-nonlinear dynamics

Soft robots, drone aerodynamics, power electronics, thermal plants. Whenever nonlinear MPC solve time is the bottleneck, and the nonlinearity can be captured by a modest dictionary.

Poor fit

Strongly discontinuous or chaotic

Contact dynamics, impacts, hybrid switching, turbulence near transition. The lifted model may predict plausibly on average but drift over long horizons. Ensemble or hybrid approaches help.

A domain example: electric motor control

PMSM torque control is an instructive case. The cross-coupled $dq$ voltage equations are mildly nonlinear due to $\omega L i$ terms; field-weakening and saturation make things worse. A polynomial-dictionary Koopman model fit from dyno data can replace a gain-scheduled PI + feed-forward controller with a single linear MPC that handles saturation constraints explicitly — without the real-time cost of a full nonlinear solve. The same principle scales to axial-flux machines where cross-coupling is stronger still.

Fig. 4 A Koopman-MPC tracking a step reference on a nonlinear plant (double-well potential with damping). Reference in amber, plant response in teal, applied control in coral below. The lifted model uses polynomial observables up to order 3.

vikoopman in rl

Linear latent dynamics for sample-efficient learning.

If the world model is linear, planning, value iteration, and exploration all simplify.

Reinforcement Learning from raw pixels or high-dimensional sensors is hard partly because the learned world model has to be both expressive and tractable for planning. Koopman-flavoured ideas offer a natural compromise: learn a nonlinear encoder $\Psi_\theta$ and a linear dynamics matrix $K$ in the latent space.

Koopman autoencoder · the architecture $$ x_k \;\xrightarrow{\Psi_\theta}\; z_k \;\xrightarrow{K}\; z_{k+1} \;\xrightarrow{\Psi_\theta^{-1}}\; \hat{x}_{k+1} $$

Encoder $\Psi_\theta$ and decoder $\Psi_\theta^{-1}$ (often paired as an autoencoder) are learned; $K$ is a linear matrix that evolves the latent state. End-to-end loss: reconstruction + multi-step prediction error.

Why this helps an RL agent

Planning is cheap. Rolling out $n$ steps in latent space is $n$ matrix-vector products. Monte Carlo tree search and cross-entropy methods become dramatically faster than rolling out a deep nonlinear dynamics network.
Value functions are polynomial. If the latent dynamics are linear and the reward is quadratic in the latent, Bellman updates have closed-form solutions à la Riccati. Critic networks become optional.
Stability is analysable. Eigenvalues of $K$ tell you whether the learned model will blow up over long horizons. You can regularise toward $\|K\|_2 \le 1$ during training.
Composition is linear. Two consecutive actions become $K_{u_2} K_{u_1}$, a matrix product. This opens clean options-frameworks and hierarchical RL.

Representative work

Lusch, Kutz & Brunton (2018) introduced Koopman autoencoders for discovering linear embeddings of nonlinear dynamics. Li et al. (2020, "Learning Compositional Koopman Operators") used them for object-centric physics models in RL. More recent work on Koopman for latent-space planning in pixel-based RL pairs the linear dynamics with learned control matrices $B_\theta(u)$, yielding a bilinear model — close enough to linear for efficient planning, flexible enough to capture action-conditioned behaviour.

Connection to the classical literature.

Koopman latent RL can be seen as a modern descendant of embedding to control (E2C) and linear dynamical systems for planning (PILCO and friends) — with a more principled justification for why the latent should be linear.

viiplayground

Build your own Koopman pendulum.

Tune the dictionary, train on data, watch prediction quality tell the truth.

A damped, driven pendulum — the hello-world of nonlinear dynamics. The plant below is simulated with full nonlinear physics: $\ddot{\theta} + c \dot{\theta} + \sin\theta = u$. You pick the dictionary of observables; EDMD fits a Koopman model from a short burst of training data; we then roll the linear model forward and compare against the nonlinear truth over a fresh trajectory.

Watch what happens when you remove $\sin\theta$ from the dictionary. The model still looks reasonable short-term, then diverges — a textbook reminder that Koopman fidelity lives or dies by the dictionary.

Fig. 5 · Damped pendulum — Koopman vs. truth

playground

nonlinear truth koopman prediction training window

Dictionary

damping c 0.20

θ(0) 1.20

train steps 120

dict size

—

1-step RMSE

—

long-horizon RMSE

—