System Identification for Control — An Interactive Primer

§ 01 Foundations & Workflow

Before any algebra, a mental model of the loop you're trying to close.

System identification is the art of turning experimental data into dynamic models. You give a physical plant some carefully chosen input, record its reaction, and fit an equation that predicts what it will do next time. The fit is never perfect — but a model that is simple, validated, and honest about its uncertainty is worth more to a controller designer than any block diagram drawn from first principles alone.

The three boxes: white, gray, black

Approach	Structure	Parameters	Typical use
White-box	Full physics known	All known or computed	Simulators, digital twins
Gray-box	Structure known	Some identified from data	Motor parameter extraction, plant tuning
Black-box	Generic form (ARX, state-space)	All identified	Complex or uncertain plants

The canonical workflow

Design the experiment. Pick input signal, amplitude, sampling rate, duration. This is where 80% of success is won or lost.
Collect and pre-process. Detrend, remove means, decimate if oversampled, apply anti-alias filtering.
Choose a model structure. First-order? ARX? State-space of order 4? Start simple.
Estimate parameters. Least squares, prediction-error, subspace — pick the tool that fits the structure.
Validate. On fresh data. Check residuals, compare simulated vs measured output.
Iterate. Increase order, try different structures, re-excite if residuals are coloured.

Perspective The goal is not the "true" model. It is a model good enough for the purpose — usually controller design, fault diagnosis, or simulation. A second-order fit that captures the dominant dynamics often out-performs an over-fit eighth-order monster when it comes to robust control design.

§ 02 Essential Terminology

The shared vocabulary of identification. Skim now, return when you hit it in the wild.

Persistent Excitation PE(n)

An input that excites enough frequencies to uniquely identify an n-parameter model. A step is PE of order 1; white noise and PRBS are PE of arbitrarily high order.

Signal-to-Noise Ratio SNR

Ratio of output variance due to input vs variance due to disturbance. Below ~10 dB your estimates will be noisy; above ~30 dB you may be ignoring nonlinearity.

PRBS u ∈ {−a, +a}

Pseudo-random binary sequence. A deterministic, periodic square-wave-like signal with near-white spectrum. Cheap to generate, easy on actuators, PE of high order.

Chirp / Swept Sine sin(ω(t)t)

Frequency-swept sinusoid. Gives direct control of which band is excited. Ideal for frequency-response estimation.

Nyquist Rate fₛ ≥ 2fₘₐₓ

Minimum sampling rate to avoid aliasing. In practice use 10–20× the dominant bandwidth and analog anti-alias filter before the ADC.

Residuals ε(t) = y − ŷ

Difference between measured output and model prediction. A well-identified model has residuals that look white and uncorrelated with past inputs.

Prediction Error ε̂(t|θ)

One-step-ahead error used as the objective in PEM. Minimising its variance yields maximum-likelihood estimates under Gaussian assumptions.

Model Order n

The number of poles (or states). Higher order fits data better but generalises worse and amplifies noise. Parsimony rules.

Parsimony Occam

Among models that fit the data similarly well, prefer the simplest. Penalised by AIC, BIC, and sanity.

FIT % NRMSE

100·(1 − ‖y − ŷ‖ / ‖y − ȳ‖). 100% is perfect, 0% is the mean. Below ~70% on validation data is usually a problem.

VAF %

Variance Accounted For = 100·(1 − var(y − ŷ)/var(y)). Similar spirit to FIT but in variance terms.

AIC / BIC −log L + kp

Information criteria that trade off fit quality against number of parameters. BIC penalises complexity harder than AIC.

Innovations e(t)

The unpredictable part of the output given past data. In an ideal model, residuals and innovations coincide.

Closed-loop ID u = K(r − y)

Identification while the loop is closed. Powerful but dangerous: disturbances correlate with the input, biasing naive least squares.

Detrending y ← y − ȳ

Removing the operating point. LTI models describe deviations from steady state — always subtract means before fitting.

Prewhitening u → W⁻¹u

Filtering the input so its spectrum becomes white. Makes correlation-based non-parametric estimates less biased.

§ 03 Non-parametric Methods

No model structure assumed. You get a curve or a frequency table — not coefficients.

Non-parametric methods estimate the plant's response directly — its impulse response, step response, or frequency response function — without committing to a transfer-function order. They are excellent for discovery, fast sanity checks, and feeding downstream parametric fits with a good initial guess.

Step response

Apply a sudden change in input, record the output. The shape tells you almost everything you need to pick a first-order or second-order starting model:

Monotonic, no overshoot → first-order or over-damped second-order.
Damped oscillation → under-damped second-order or higher.
Long flat, then rise → dead-time / pure delay present.
Inverse response (dips the wrong way first) → non-minimum-phase zero.

Impulse response

Theoretically elegant — the impulse response is the plant (for LTI). Practically limited because true impulses are hard to apply and SNR is poor. Usually obtained indirectly from correlation analysis rather than from an actual impulse.

Frequency response (Bode from data)

Inject a sinusoid of frequency $\omega_k$, wait for steady state, measure amplitude ratio and phase shift. Repeat for many frequencies. The result is $G(j\omega_k)$ sampled across the band of interest — a Bode plot built empirically. Modern variants use chirp or multisine inputs and FFT-based estimation, often called the Empirical Transfer Function Estimate (ETFE):

$$ \hat{G}_N(e^{j\omega}) \;=\; \frac{Y_N(\omega)}{U_N(\omega)} $$ ETFE — ratio of DFTs

ETFE is unbiased but variance-noisy; smoothing with a spectral window (Hamming, Parzen) yields the spectral analysis estimate.

Correlation analysis

If you feed the plant a zero-mean input, the cross-correlation between input and output divided by the input auto-correlation gives the impulse response:

$$ \hat{g}(\tau) \;=\; \frac{R_{yu}(\tau)}{R_{uu}(0)} \quad \text{for white } u $$ correlation estimator

This is the engine behind PRBS-based identification: PRBS has nearly white auto-correlation, so the cross-correlation with the output essentially is the impulse response estimate.

Practical note Always start a campaign with a clean step or two before PRBS. The step tells you if the plant is even approximately linear at your operating point, and gives a ballpark for time-constant — which in turn sets the PRBS clock period.

§ 04 The First-Order Model

The workhorse. Two parameters, one exponential, enormous explanatory power.

A surprising fraction of industrial loops are well-approximated by a first-order transfer function with optional dead time:

$$ G(s) \;=\; \frac{K}{\tau s + 1}\,e^{-Ls} $$ FOPDT — first-order plus dead time

Three scalars, three physical meanings: gain $K$ (how much the output eventually moves), time constant $\tau$ (how fast), and dead time $L$ (how long before you see anything).

Reading parameters off a step response

For a step of size $U$ applied at $t=0$, the response is $y(t) = KU(1 - e^{-t/\tau})$ for $t \geq L$. Three classic graphical rules:

Gain. $K = \Delta y_{\infty} / \Delta u$. Read final value minus initial value, divide by input step size.
Time constant — the 63.2% rule. At $t = \tau$ the output has reached $1 - e^{-1} \approx 0.632$ of its final value. Draw horizontal at 0.632·Δy∞, drop a vertical.
Time constant — the two-point method. Use $t_{28.3\%}$ and $t_{63.2\%}$: then $\tau = 1.5\,(t_{63.2} - t_{28.3})$ and dead time $L = t_{63.2} - \tau$. More noise-robust than a single-point read.

Lab 1 · Fit a first-order model to noisy step data

Real data hides the exponential under a fog of noise. Move the sliders to match $\hat{K}$ and $\hat{\tau}$ to the hidden truth. Use the 63.2% rule: where does the data cross 63% of its final value?

Gain K̂ 1.00

Time constant τ̂ (s) 1.00

RMSE:— FIT %:— 63.2% point:— Residual mean:—

When first-order is not enough

If the step response has a clear overshoot, an inflection point, or a visible delay followed by an S-shape, the plant needs more than one pole. The S-shape in particular usually signals either (a) dead time, or (b) multiple real poles whose combined behaviour mimics a sluggish second-order. The tangent method (Ziegler-Nichols) handles (b) by fitting an equivalent FOPDT to the steepest-slope tangent line.

§ 05 The Second-Order Model

Two poles, and suddenly the world oscillates.

The canonical underdamped second-order form:

$$ G(s) \;=\; \frac{K\,\omega_n^{2}}{s^{2} + 2\zeta\omega_n s + \omega_n^{2}} $$ standard second-order

Three knobs: DC gain $K$, natural frequency $\omega_n$ (how fast), and damping ratio $\zeta$ (how oscillatory). The ratio $\zeta$ tells the whole story:

$\zeta = 0$ — undamped, pure sinusoid forever.
$0 < \zeta < 1$ — underdamped, decaying oscillation.
$\zeta = 1$ — critically damped, fastest approach without overshoot.
$\zeta > 1$ — overdamped, two real poles, slow monotonic rise.

Reading parameters from the step response

For an underdamped step response, the following three relations usually give you everything:

$$ M_p = \exp\!\left(\frac{-\pi\zeta}{\sqrt{1-\zeta^{2}}}\right),\quad t_p = \frac{\pi}{\omega_n\sqrt{1-\zeta^{2}}},\quad t_s \approx \frac{4}{\zeta\omega_n}\;(2\%\text{ band}) $$ overshoot · peak time · settling time

So: measure overshoot $M_p$ and peak time $t_p$ from the plot. Invert the first relation for $\zeta$, then the second for $\omega_n$. Gain $K$ is again just $\Delta y_\infty / \Delta u$.

The logarithmic decrement method

More robust when noise contaminates a single peak reading, use two successive peaks (amplitudes $a_1$ and $a_2$ above steady state):

$$ \delta = \ln(a_1/a_2) \;\;\Rightarrow\;\; \zeta = \frac{\delta}{\sqrt{4\pi^{2} + \delta^{2}}} $$ log-decrement formula

Lab 2 · Fit an underdamped second-order system

Three sliders. Use the first overshoot to lock in ζ̂, the peak time to lock in ω̂ₙ, and the final value to lock in K̂. Watch peak time and overshoot predictions update live.

Gain K̂ 1.00

ω̂ₙ (rad/s) 3.00

Damping ζ̂ 0.50

RMSE:— FIT %:— Predicted M_p:— Predicted t_p (s):—

Higher-order echoes of second-order thinking

Real plants often have many poles. A useful rule: if two poles are much closer to the $j\omega$ axis than all the others, the slow pair dominates — you can treat the plant as second-order for controller design, and push the faster dynamics into an unmodeled residue. This is the foundation of dominant-pole design.

Motor-control Note A PMSM current loop, once tuned, behaves almost as a first-order system (τ ≈ L/R in the rotor frame). The outer speed loop, by contrast, is often a gentle second-order with ζ between 0.6 and 0.8 — a deliberate design choice that trades a small overshoot for faster settling.

§ 06 Parametric Methods

Commit to a model structure. Crank the data. Get numbers out.

Parametric identification assumes a model form with a finite parameter vector $\theta$ and finds the $\hat\theta$ that minimises a criterion — usually the one-step-ahead prediction error variance. The difference between ARX, ARMAX, OE, and BJ is where you allow the noise to enter the model.

ARX

A(q) y(t) = B(q) u(t) + e(t)

Linear in parameters → one-shot least squares
Assumes white noise after A-filtering
Biased if true noise is coloured; fast and robust
Best first cut for any parametric campaign

ARMAX

A(q) y = B(q) u + C(q) e

Separate noise-colouring polynomial C(q)
Iterative (nonlinear) optimisation
Handles coloured disturbances gracefully
Standard choice when residual whiteness fails on ARX

OE (Output Error)

y = B(q)/F(q) · u + e

Plant and noise models are independent
Noise assumed additive at output — realistic for many rigs
Consistent plant estimate regardless of disturbance colour
Nonlinear in parameters; may have local minima

Box–Jenkins

y = (B/F) u + (C/D) e

Most general: fully decoupled plant and noise
Best statistical properties — and the most parameters
Use when you care about accurate disturbance modelling (e.g. Kalman filter design)

State-space /
Subspace (N4SID, MOESP)

x(k+1) = Ax + Bu + w, y = Cx + Du + v

Returns (A, B, C, D) directly
Non-iterative, numerically stable (SVD-based)
Excellent for MIMO systems
Picks model order from singular-value drop-off

Instrumental Variables

θ̂ = (Zᵀ φ)⁻¹ Zᵀ y

Bias-corrected variant of least squares
Uses "instruments" Z uncorrelated with noise but correlated with regressors
Staple tool for closed-loop identification

How ARX actually gets solved

Stack the regression $y(t) = \varphi(t)^{\!\top}\theta + e(t)$ where $\varphi(t) = [-y(t-1), \dots, -y(t-n_a),\, u(t-n_k), \dots, u(t-n_k-n_b+1)]^{\!\top}$. Minimising $\sum \varepsilon(t)^{2}$ yields the normal equations:

$$ \hat\theta_{\text{LS}} \;=\; \left(\sum_{t=1}^{N} \varphi(t)\varphi(t)^{\!\top}\right)^{\!-1}\!\sum_{t=1}^{N} \varphi(t) y(t) $$ ARX least-squares solution

This is ordinary linear regression — the same mechanics as fitting a polynomial. The beauty of ARX is that a one-line QR solve delivers the parameters. The price is the (usually unrealistic) assumption that the driving noise $e(t)$ is white.

The prediction-error method (PEM)

A general framework: given a model $\mathcal{M}(\theta)$ that produces a one-step prediction $\hat y(t|t-1;\theta)$, choose $\theta$ to minimise

$$ V_N(\theta) \;=\; \frac{1}{N}\sum_{t=1}^{N}\,\ell\!\left(\,y(t) - \hat y(t\mid t-1;\theta)\,\right) $$ prediction-error criterion

with $\ell(\cdot)$ typically the squared norm. ARX, ARMAX, OE, and BJ all sit inside this framework — they differ only in how the predictor is parameterised. In MATLAB, tfest, armax, oe, bj, and ssest all invoke variants of PEM.

Model-order selection

You want the smallest model that explains the data. Three tools:

AIC: $\text{AIC} = N\log V_N + 2 d$ where $d$ is the number of free parameters. Penalises complexity lightly.
BIC (MDL): $\text{BIC} = N\log V_N + d\log N$. Penalises complexity more — preferred when $N$ is large.
Hankel singular values (subspace methods): plot them; an obvious gap indicates the right order.

§ 07 Input Design & Excitation

A good experiment matters more than a clever algorithm.

No fitting routine can recover dynamics you never excited. The question is not just "what input?" but "what spectrum?" — you need energy in every frequency band you care about modelling.

StepPE order 1

Simple, physical, great for quick first-order fits. Poor spectral coverage — energy concentrated at low frequencies.

PRBSPE ≈ 2ⁿ−1

Binary, periodic, near-white spectrum. Workhorse of parametric identification. Easy on actuators (bang-bang).

Chirpband-shaped

Swept sine. Direct control of excited bandwidth. Excellent for frequency-response estimation and resonance finding.

Amplitude selection — the Goldilocks problem

Too small: output swamped by noise, SNR collapses, estimates become garbage.
Too large: plant leaves its linear region — saturation, friction break-away, back-EMF nonlinearity in motors.
Goldilocks: largest amplitude that still produces linear-looking output. Start at 5% of normal operating span and sweep up while watching for output asymmetry (response to +step ≠ − response to −step).

Duration and record length

Rules of thumb: collect data for at least 10–20 dominant time constants, or at least one full period of your lowest-frequency PRBS, whichever is longer. For statistical confidence, $N \gtrsim 20 \cdot d$ samples where $d$ is the number of parameters.

Sampling rate

Pick $f_s$ so that $f_s / f_{\text{bandwidth}} \in [10, 30]$. Slower and you miss dynamics; faster and you waste data and amplify quantisation noise. Always precede the ADC with an analog anti-alias filter at $f_s/2$.

§ 08 Validation & Model Choice

Never validate on the data you fit. Never.

Fitting is the easy part — anything will fit if you throw enough parameters at it. Validation is what separates a model you can stake a controller on from a pretty curve.

Three validation pillars

Fresh-data simulation (FIT %). Simulate the model on input data it has never seen. Compare to measured output. FIT above 80% usually means the model is usable.
Residual whiteness. Compute $R_{\varepsilon\varepsilon}(\tau)$; it should look like a delta at τ=0 and noise everywhere else. Colour in the residuals means the model is missing dynamics.
Residual-input cross-correlation. $R_{\varepsilon u}(\tau)$ should be zero for $\tau \geq 0$. Non-zero values mean the input is still predicting something the model isn't capturing.

Simulation vs prediction — a subtle distinction

A one-step-ahead prediction uses past measured outputs as inputs. It almost always looks great, because yesterday's output is a superb predictor of today's output. Simulation uses only past inputs and the model's own past predictions — it's the honest test. Always report FIT on simulation, not prediction.

Workflow Tip Split your data 70/30 before you even start fitting. Keep the 30% locked in a separate file. Only touch it after your model choice is final. If final-validation FIT is much lower than fit-data FIT, you've overfit.

§ 09 Common Pitfalls

The traps every practitioner falls into at least once.

Forgetting to detrend. A 5 V steady-state bias on a 10 V step test will wreck your least-squares fit. Always subtract operating-point means.
Insufficient excitation. Identifying a 6-parameter model from a single step is mathematically possible but statistically reckless. Use PRBS.
Closed-loop bias. With the controller running, noise affects both $y$ and $u$, so standard least-squares is biased. Use OE, BJ, or instrumental-variable methods — or break the loop if you can.
Sampling too fast. At $f_s \gg$ bandwidth, model poles cluster near $z=1$, numerical conditioning degrades, and noise variance dominates the regression.
Sampling too slow. You alias dynamics you cared about into frequencies you didn't want. No amount of clever fitting fixes this.
Over-modelling. Adding states to chase residuals that are really measurement noise. Watch AIC/BIC; watch whether added parameters change sign between datasets.
Under-modelling. A first-order fit to a system with oscillatory behaviour. Residuals will scream.
Ignoring dead time. Even 2–3 samples of transport delay, if unmodelled, will ruin a good estimator. Always check cross-correlation for a peak offset from zero.
Operating-point trap. A nonlinear plant fit linearly at one setpoint may behave completely differently at another. Identify across the envelope, not at a single op-point.

§ 10 Rules of Thumb

Ten commandments, refined by everyone who ever burned a day on bad data.

RULE 01

Sample 10–20× your bandwidth.

Faster than that, you amplify noise; slower, you alias. Always include an analog anti-alias filter before the ADC.

RULE 02

Excite every frequency you care about.

Steps excite only DC well. Use PRBS or multisine for broadband work. Design the spectrum to match the controller's bandwidth.

RULE 03

Collect 10–20 dominant time constants of data.

Less than that and low-frequency content is poorly conditioned. For PRBS, at least one full sequence period.

RULE 04

Always detrend.

Subtract means, remove linear trends. LTI models live in deviation variables, not absolute values.

RULE 05

Start with first-order plus dead time.

Always fit FOPDT first. If residuals are coloured or structured, then escalate to second-order or parametric.

RULE 06

Parsimony wins.

Smallest model with FIT > ~80% on fresh data beats a larger model that fits training data better. Use BIC when unsure.

RULE 07

Split your data: 70% fit, 30% validate.

Hold the 30% back from the very first line of code. No peeking.

RULE 08

Amplitude: largest still-linear.

Identification needs SNR. Push amplitude until you detect nonlinearity (check ±step symmetry), then back off 20%.

RULE 09

Look at residuals. Always.

If they are white and uncorrelated with the input, you are done. If not, the plant is telling you what it has that you don't.

RULE 10

Identify at the operating point you control at.

A motor identified at 200 rpm will not describe 4000 rpm. Span the envelope, or schedule models.

Closing Thought System identification is less a mathematical procedure than a disciplined conversation with a physical device. The data has something to tell you; your job is to build the simplest possible structure that lets it say so clearly — then design a controller worthy of the answer.

§ 01 Foundations & Workflow

The three boxes: white, gray, black

The canonical workflow

§ 02 Essential Terminology

Persistent Excitation PE(n)

Signal-to-Noise Ratio SNR

PRBS u ∈ {−a, +a}

Chirp / Swept Sine sin(ω(t)t)

Nyquist Rate fₛ ≥ 2fₘₐₓ

Residuals ε(t) = y − ŷ

Prediction Error ε̂(t|θ)

Model Order n

Parsimony Occam

FIT % NRMSE

VAF %

AIC / BIC −log L + kp

Innovations e(t)

Closed-loop ID u = K(r − y)

Detrending y ← y − ȳ

Prewhitening u → W⁻¹u

§ 03 Non-parametric Methods

Step response

Impulse response

Frequency response (Bode from data)

Correlation analysis

§ 04 The First-Order Model

Reading parameters off a step response

When first-order is not enough

§ 05 The Second-Order Model

Reading parameters from the step response

The logarithmic decrement method

Higher-order echoes of second-order thinking

§ 06 Parametric Methods

ARX

ARMAX

OE (Output Error)

Box–Jenkins

State-space /Subspace (N4SID, MOESP)

Instrumental Variables

How ARX actually gets solved

The prediction-error method (PEM)

Model-order selection

§ 07 Input Design & Excitation

StepPE order 1

PRBSPE ≈ 2ⁿ−1

Chirpband-shaped

Amplitude selection — the Goldilocks problem

Duration and record length

Sampling rate

§ 08 Validation & Model Choice

Three validation pillars

Simulation vs prediction — a subtle distinction

§ 09 Common Pitfalls

§ 10 Rules of Thumb

Sample 10–20× your bandwidth.

Excite every frequency you care about.

Collect 10–20 dominant time constants of data.

Always detrend.

Start with first-order plus dead time.

Parsimony wins.

Split your data: 70% fit, 30% validate.

Amplitude: largest still-linear.

Look at residuals. Always.

Identify at the operating point you control at.

State-space /
Subspace (N4SID, MOESP)