Most anomaly-detection algorithms try to characterize “normal” — fit a Gaussian, learn a density, draw a boundary — and then flag what falls outside it. The Isolation Forest does the opposite. It tries to characterize how easy something is to separate. The intuition is that an anomalous point is usually far from any cluster, which means a random axis-aligned split is likely to isolate it in a small number of cuts. A normal point, surrounded by other normal points, requires many more splits before it ends up alone in a leaf.

So you build a forest of trees that recursively split the data, you measure the average depth at which each point gets isolated, and you call short paths “anomalous.” It’s a remarkably elegant inversion of the usual framing, and it scales to high dimensions in a way that density-based methods don’t.

This is monograph № 2 of the representation/structure-learning series.

What it covers

Six sections, five live figures, about fifteen minutes to read.

§1 — The core idea, in one sentence. Anomalies are easy to isolate. The whole algorithm reduces to making that claim precise.

§2 — A point gets isolated. Live figure walking through a single isolation: drag a point, watch the splits accumulate until it’s alone.

§3 — Growing a single tree. The recursive split process. Why axis-aligned, why random splits work, the stopping conditions.

§4 — From one tree to a forest. Why one tree is noisy and many trees are stable. The averaging that turns isolation depth into an anomaly score.

§5 — Path lengths tell the whole story. The math that connects average path length to an anomaly score in [0, 1]. The normalization that makes the threshold interpretable.

§6 — Why it works so well. The properties of the algorithm that explain its remarkable performance. Linear scaling. No need for distance metrics. Tolerance to irrelevant features. The handful of failure modes (axis-aligned anomalies, near-duplicate points) and how to handle them.

Read it

Open the primer →

← Back to Autonomy