§ 01 Foundations & Workflow
Before any algebra, a mental model of the loop you're trying to close.
System identification is the art of turning experimental data into dynamic models. You give a physical plant some carefully chosen input, record its reaction, and fit an equation that predicts what it will do next time. The fit is never perfect — but a model that is simple, validated, and honest about its uncertainty is worth more to a controller designer than any block diagram drawn from first principles alone.
The three boxes: white, gray, black
| Approach | Structure | Parameters | Typical use |
|---|---|---|---|
| White-box | Full physics known | All known or computed | Simulators, digital twins |
| Gray-box | Structure known | Some identified from data | Motor parameter extraction, plant tuning |
| Black-box | Generic form (ARX, state-space) | All identified | Complex or uncertain plants |
The canonical workflow
- Design the experiment. Pick input signal, amplitude, sampling rate, duration. This is where 80% of success is won or lost.
- Collect and pre-process. Detrend, remove means, decimate if oversampled, apply anti-alias filtering.
- Choose a model structure. First-order? ARX? State-space of order 4? Start simple.
- Estimate parameters. Least squares, prediction-error, subspace — pick the tool that fits the structure.
- Validate. On fresh data. Check residuals, compare simulated vs measured output.
- Iterate. Increase order, try different structures, re-excite if residuals are coloured.
§ 02 Essential Terminology
The shared vocabulary of identification. Skim now, return when you hit it in the wild.
Persistent Excitation PE(n)
An input that excites enough frequencies to uniquely identify an n-parameter model. A step is PE of order 1; white noise and PRBS are PE of arbitrarily high order.
Signal-to-Noise Ratio SNR
Ratio of output variance due to input vs variance due to disturbance. Below ~10 dB your estimates will be noisy; above ~30 dB you may be ignoring nonlinearity.
PRBS u ∈ {−a, +a}
Pseudo-random binary sequence. A deterministic, periodic square-wave-like signal with near-white spectrum. Cheap to generate, easy on actuators, PE of high order.
Chirp / Swept Sine sin(ω(t)t)
Frequency-swept sinusoid. Gives direct control of which band is excited. Ideal for frequency-response estimation.
Nyquist Rate fₛ ≥ 2fₘₐₓ
Minimum sampling rate to avoid aliasing. In practice use 10–20× the dominant bandwidth and analog anti-alias filter before the ADC.
Residuals ε(t) = y − ŷ
Difference between measured output and model prediction. A well-identified model has residuals that look white and uncorrelated with past inputs.
Prediction Error ε̂(t|θ)
One-step-ahead error used as the objective in PEM. Minimising its variance yields maximum-likelihood estimates under Gaussian assumptions.
Model Order n
The number of poles (or states). Higher order fits data better but generalises worse and amplifies noise. Parsimony rules.
Parsimony Occam
Among models that fit the data similarly well, prefer the simplest. Penalised by AIC, BIC, and sanity.
FIT % NRMSE
100·(1 − ‖y − ŷ‖ / ‖y − ȳ‖). 100% is perfect, 0% is the mean. Below ~70% on validation data is usually a problem.
VAF %
Variance Accounted For = 100·(1 − var(y − ŷ)/var(y)). Similar spirit to FIT but in variance terms.
AIC / BIC −log L + kp
Information criteria that trade off fit quality against number of parameters. BIC penalises complexity harder than AIC.
Innovations e(t)
The unpredictable part of the output given past data. In an ideal model, residuals and innovations coincide.
Closed-loop ID u = K(r − y)
Identification while the loop is closed. Powerful but dangerous: disturbances correlate with the input, biasing naive least squares.
Detrending y ← y − ȳ
Removing the operating point. LTI models describe deviations from steady state — always subtract means before fitting.
Prewhitening u → W⁻¹u
Filtering the input so its spectrum becomes white. Makes correlation-based non-parametric estimates less biased.
§ 03 Non-parametric Methods
No model structure assumed. You get a curve or a frequency table — not coefficients.
Non-parametric methods estimate the plant's response directly — its impulse response, step response, or frequency response function — without committing to a transfer-function order. They are excellent for discovery, fast sanity checks, and feeding downstream parametric fits with a good initial guess.
Step response
Apply a sudden change in input, record the output. The shape tells you almost everything you need to pick a first-order or second-order starting model:
- Monotonic, no overshoot → first-order or over-damped second-order.
- Damped oscillation → under-damped second-order or higher.
- Long flat, then rise → dead-time / pure delay present.
- Inverse response (dips the wrong way first) → non-minimum-phase zero.
Impulse response
Theoretically elegant — the impulse response is the plant (for LTI). Practically limited because true impulses are hard to apply and SNR is poor. Usually obtained indirectly from correlation analysis rather than from an actual impulse.
Frequency response (Bode from data)
Inject a sinusoid of frequency $\omega_k$, wait for steady state, measure amplitude ratio and phase shift. Repeat for many frequencies. The result is $G(j\omega_k)$ sampled across the band of interest — a Bode plot built empirically. Modern variants use chirp or multisine inputs and FFT-based estimation, often called the Empirical Transfer Function Estimate (ETFE):
ETFE is unbiased but variance-noisy; smoothing with a spectral window (Hamming, Parzen) yields the spectral analysis estimate.
Correlation analysis
If you feed the plant a zero-mean input, the cross-correlation between input and output divided by the input auto-correlation gives the impulse response:
This is the engine behind PRBS-based identification: PRBS has nearly white auto-correlation, so the cross-correlation with the output essentially is the impulse response estimate.
§ 04 The First-Order Model
The workhorse. Two parameters, one exponential, enormous explanatory power.
A surprising fraction of industrial loops are well-approximated by a first-order transfer function with optional dead time:
Three scalars, three physical meanings: gain $K$ (how much the output eventually moves), time constant $\tau$ (how fast), and dead time $L$ (how long before you see anything).
Reading parameters off a step response
For a step of size $U$ applied at $t=0$, the response is $y(t) = KU(1 - e^{-t/\tau})$ for $t \geq L$. Three classic graphical rules:
- Gain. $K = \Delta y_{\infty} / \Delta u$. Read final value minus initial value, divide by input step size.
- Time constant — the 63.2% rule. At $t = \tau$ the output has reached $1 - e^{-1} \approx 0.632$ of its final value. Draw horizontal at 0.632·Δy∞, drop a vertical.
- Time constant — the two-point method. Use $t_{28.3\%}$ and $t_{63.2\%}$: then $\tau = 1.5\,(t_{63.2} - t_{28.3})$ and dead time $L = t_{63.2} - \tau$. More noise-robust than a single-point read.
Real data hides the exponential under a fog of noise. Move the sliders to match $\hat{K}$ and $\hat{\tau}$ to the hidden truth. Use the 63.2% rule: where does the data cross 63% of its final value?
When first-order is not enough
If the step response has a clear overshoot, an inflection point, or a visible delay followed by an S-shape, the plant needs more than one pole. The S-shape in particular usually signals either (a) dead time, or (b) multiple real poles whose combined behaviour mimics a sluggish second-order. The tangent method (Ziegler-Nichols) handles (b) by fitting an equivalent FOPDT to the steepest-slope tangent line.
§ 05 The Second-Order Model
Two poles, and suddenly the world oscillates.
The canonical underdamped second-order form:
Three knobs: DC gain $K$, natural frequency $\omega_n$ (how fast), and damping ratio $\zeta$ (how oscillatory). The ratio $\zeta$ tells the whole story:
- $\zeta = 0$ — undamped, pure sinusoid forever.
- $0 < \zeta < 1$ — underdamped, decaying oscillation.
- $\zeta = 1$ — critically damped, fastest approach without overshoot.
- $\zeta > 1$ — overdamped, two real poles, slow monotonic rise.
Reading parameters from the step response
For an underdamped step response, the following three relations usually give you everything:
So: measure overshoot $M_p$ and peak time $t_p$ from the plot. Invert the first relation for $\zeta$, then the second for $\omega_n$. Gain $K$ is again just $\Delta y_\infty / \Delta u$.
The logarithmic decrement method
More robust when noise contaminates a single peak reading, use two successive peaks (amplitudes $a_1$ and $a_2$ above steady state):
Three sliders. Use the first overshoot to lock in ζ̂, the peak time to lock in ω̂ₙ, and the final value to lock in K̂. Watch peak time and overshoot predictions update live.
Higher-order echoes of second-order thinking
Real plants often have many poles. A useful rule: if two poles are much closer to the $j\omega$ axis than all the others, the slow pair dominates — you can treat the plant as second-order for controller design, and push the faster dynamics into an unmodeled residue. This is the foundation of dominant-pole design.
§ 06 Parametric Methods
Commit to a model structure. Crank the data. Get numbers out.
Parametric identification assumes a model form with a finite parameter vector $\theta$ and finds the $\hat\theta$ that minimises a criterion — usually the one-step-ahead prediction error variance. The difference between ARX, ARMAX, OE, and BJ is where you allow the noise to enter the model.
ARX
- Linear in parameters → one-shot least squares
- Assumes white noise after A-filtering
- Biased if true noise is coloured; fast and robust
- Best first cut for any parametric campaign
ARMAX
- Separate noise-colouring polynomial C(q)
- Iterative (nonlinear) optimisation
- Handles coloured disturbances gracefully
- Standard choice when residual whiteness fails on ARX
OE (Output Error)
- Plant and noise models are independent
- Noise assumed additive at output — realistic for many rigs
- Consistent plant estimate regardless of disturbance colour
- Nonlinear in parameters; may have local minima
Box–Jenkins
- Most general: fully decoupled plant and noise
- Best statistical properties — and the most parameters
- Use when you care about accurate disturbance modelling (e.g. Kalman filter design)
State-space /
Subspace (N4SID, MOESP)
- Returns (A, B, C, D) directly
- Non-iterative, numerically stable (SVD-based)
- Excellent for MIMO systems
- Picks model order from singular-value drop-off
Instrumental Variables
- Bias-corrected variant of least squares
- Uses "instruments" Z uncorrelated with noise but correlated with regressors
- Staple tool for closed-loop identification
How ARX actually gets solved
Stack the regression $y(t) = \varphi(t)^{\!\top}\theta + e(t)$ where $\varphi(t) = [-y(t-1), \dots, -y(t-n_a),\, u(t-n_k), \dots, u(t-n_k-n_b+1)]^{\!\top}$. Minimising $\sum \varepsilon(t)^{2}$ yields the normal equations:
This is ordinary linear regression — the same mechanics as fitting a polynomial. The beauty of ARX is that a one-line QR solve delivers the parameters. The price is the (usually unrealistic) assumption that the driving noise $e(t)$ is white.
The prediction-error method (PEM)
A general framework: given a model $\mathcal{M}(\theta)$ that produces a one-step prediction $\hat y(t|t-1;\theta)$, choose $\theta$ to minimise
with $\ell(\cdot)$ typically the squared norm. ARX, ARMAX, OE, and BJ all sit inside this framework — they differ only in how the predictor is parameterised. In MATLAB, tfest, armax, oe, bj, and ssest all invoke variants of PEM.
Model-order selection
You want the smallest model that explains the data. Three tools:
- AIC: $\text{AIC} = N\log V_N + 2 d$ where $d$ is the number of free parameters. Penalises complexity lightly.
- BIC (MDL): $\text{BIC} = N\log V_N + d\log N$. Penalises complexity more — preferred when $N$ is large.
- Hankel singular values (subspace methods): plot them; an obvious gap indicates the right order.
§ 07 Input Design & Excitation
A good experiment matters more than a clever algorithm.
No fitting routine can recover dynamics you never excited. The question is not just "what input?" but "what spectrum?" — you need energy in every frequency band you care about modelling.
StepPE order 1
Simple, physical, great for quick first-order fits. Poor spectral coverage — energy concentrated at low frequencies.
PRBSPE ≈ 2ⁿ−1
Binary, periodic, near-white spectrum. Workhorse of parametric identification. Easy on actuators (bang-bang).
Chirpband-shaped
Swept sine. Direct control of excited bandwidth. Excellent for frequency-response estimation and resonance finding.
Amplitude selection — the Goldilocks problem
- Too small: output swamped by noise, SNR collapses, estimates become garbage.
- Too large: plant leaves its linear region — saturation, friction break-away, back-EMF nonlinearity in motors.
- Goldilocks: largest amplitude that still produces linear-looking output. Start at 5% of normal operating span and sweep up while watching for output asymmetry (response to +step ≠ − response to −step).
Duration and record length
Rules of thumb: collect data for at least 10–20 dominant time constants, or at least one full period of your lowest-frequency PRBS, whichever is longer. For statistical confidence, $N \gtrsim 20 \cdot d$ samples where $d$ is the number of parameters.
Sampling rate
Pick $f_s$ so that $f_s / f_{\text{bandwidth}} \in [10, 30]$. Slower and you miss dynamics; faster and you waste data and amplify quantisation noise. Always precede the ADC with an analog anti-alias filter at $f_s/2$.
§ 08 Validation & Model Choice
Never validate on the data you fit. Never.
Fitting is the easy part — anything will fit if you throw enough parameters at it. Validation is what separates a model you can stake a controller on from a pretty curve.
Three validation pillars
- Fresh-data simulation (FIT %). Simulate the model on input data it has never seen. Compare to measured output. FIT above 80% usually means the model is usable.
- Residual whiteness. Compute $R_{\varepsilon\varepsilon}(\tau)$; it should look like a delta at τ=0 and noise everywhere else. Colour in the residuals means the model is missing dynamics.
- Residual-input cross-correlation. $R_{\varepsilon u}(\tau)$ should be zero for $\tau \geq 0$. Non-zero values mean the input is still predicting something the model isn't capturing.
Simulation vs prediction — a subtle distinction
A one-step-ahead prediction uses past measured outputs as inputs. It almost always looks great, because yesterday's output is a superb predictor of today's output. Simulation uses only past inputs and the model's own past predictions — it's the honest test. Always report FIT on simulation, not prediction.
§ 09 Common Pitfalls
The traps every practitioner falls into at least once.
- Forgetting to detrend. A 5 V steady-state bias on a 10 V step test will wreck your least-squares fit. Always subtract operating-point means.
- Insufficient excitation. Identifying a 6-parameter model from a single step is mathematically possible but statistically reckless. Use PRBS.
- Closed-loop bias. With the controller running, noise affects both $y$ and $u$, so standard least-squares is biased. Use OE, BJ, or instrumental-variable methods — or break the loop if you can.
- Sampling too fast. At $f_s \gg$ bandwidth, model poles cluster near $z=1$, numerical conditioning degrades, and noise variance dominates the regression.
- Sampling too slow. You alias dynamics you cared about into frequencies you didn't want. No amount of clever fitting fixes this.
- Over-modelling. Adding states to chase residuals that are really measurement noise. Watch AIC/BIC; watch whether added parameters change sign between datasets.
- Under-modelling. A first-order fit to a system with oscillatory behaviour. Residuals will scream.
- Ignoring dead time. Even 2–3 samples of transport delay, if unmodelled, will ruin a good estimator. Always check cross-correlation for a peak offset from zero.
- Operating-point trap. A nonlinear plant fit linearly at one setpoint may behave completely differently at another. Identify across the envelope, not at a single op-point.
§ 10 Rules of Thumb
Ten commandments, refined by everyone who ever burned a day on bad data.
Sample 10–20× your bandwidth.
Faster than that, you amplify noise; slower, you alias. Always include an analog anti-alias filter before the ADC.
Excite every frequency you care about.
Steps excite only DC well. Use PRBS or multisine for broadband work. Design the spectrum to match the controller's bandwidth.
Collect 10–20 dominant time constants of data.
Less than that and low-frequency content is poorly conditioned. For PRBS, at least one full sequence period.
Always detrend.
Subtract means, remove linear trends. LTI models live in deviation variables, not absolute values.
Start with first-order plus dead time.
Always fit FOPDT first. If residuals are coloured or structured, then escalate to second-order or parametric.
Parsimony wins.
Smallest model with FIT > ~80% on fresh data beats a larger model that fits training data better. Use BIC when unsure.
Split your data: 70% fit, 30% validate.
Hold the 30% back from the very first line of code. No peeking.
Amplitude: largest still-linear.
Identification needs SNR. Push amplitude until you detect nonlinearity (check ±step symmetry), then back off 20%.
Look at residuals. Always.
If they are white and uncorrelated with the input, you are done. If not, the plant is telling you what it has that you don't.
Identify at the operating point you control at.
A motor identified at 200 rpm will not describe 4000 rpm. Span the envelope, or schedule models.