← Back to Autonomy
Robotics · A Visual Primer

Localization, Mapping & SLAM

Where am I? What's around me? And how do I figure out both at the same time?

1 — Three questions, three problems

A robot moving through space — a vacuum, a forklift, a drone, a rover, an AR headset on your face — almost always needs to answer some combination of three deceptively simple questions:

  1. Where am I? — given a map, figure out my pose in it.
  2. What's around me? — given that I know where I am, build a map.
  3. Both at once? — given neither, figure out both simultaneously.

These map onto three classical problems: Localization, Mapping, and SLAM. They look related, and they are, but the third is hilariously harder than the other two combined. This page works through all three with seven interactive demos so you can see how the algorithms behave, fail, and recover.

The unifying idea

All three problems are solved roughly the same way: maintain a probability distribution over what you don't know (pose, map, or both), and update it whenever you (a) move or (b) sense something. This is recursive Bayesian estimation. The rest is engineering: which distribution, which approximation, which sensor.

Reference implementation

The methods covered here are all implemented in AtsushiSakai/PythonRobotics, a widely-used open-source collection of Python sample code for robotics algorithms. Where we name a specific algorithm, you can find a runnable Python version in that repo. We borrow its taxonomy of methods throughout this primer.

2 — Localization: "Where am I?"

You hand the robot a map and switch it on somewhere on that map. It doesn't know where. As it drives and senses, it has to figure out which point on the map matches what it's seeing.

The intuition

Imagine being teleported, blindfolded, into an apartment whose floor plan you've memorized. You take off the blindfold. You see a kitchen counter to your left and a window to your right. There might be three rooms in the apartment that match that description — so you're not sure yet. You take a step forward and now you see a hallway. Only one room had a hallway exit in that direction. Now you know where you are.

That's exactly how a localization algorithm works. It starts with many hypotheses ("I could be anywhere"), and as observations arrive, the hypotheses that don't match get killed off until only the right ones survive. The four classical methods differ in how they represent those hypotheses:

The first three are below as live demos. The fourth shows up in §4 because it's also the workhorse of SLAM front ends.

Method 1 — Monte Carlo Localization (MCL) particle filter

The most flexible representation: maintain a swarm of weighted samples (particles) of where you could be. On every motion, each particle moves with motion noise. On every sensor observation, weight each particle by the likelihood of the observation given that particle's pose. Resample to keep the swarm focused on the high-likelihood regions.

Demo 1 — Monte Carlo Localization in a 1D corridor
A robot (green) drives along a known corridor with doors (blue) at known positions. It senses distance to the nearest door, with noise. The 300 red particles each represent one hypothesis. Watch them collapse onto the truth — and try the kidnap button to see what recovery looks like.
True robot pose Doors (landmarks) Particle hypotheses
step 0 true x = mean estimate = std dev =

Three things to notice:

Method 2 — Extended Kalman Filter (EKF) localization single Gaussian

If you trust your initial pose and your sensors aren't too crazy, you can summarize the entire belief with a single Gaussian: a mean (best guess) and a covariance matrix (how confident, in which direction). The EKF runs the same predict / update cycle as MCL, but instead of a swarm it carries the mean and covariance through linearized motion and observation models.

predict:   x̂⁻ = f(x̂, u),   P⁻ = F P Fᵀ + Q
update:   K = P⁻ Hᵀ (H P⁻ Hᵀ + R)⁻¹,   x̂ = x̂⁻ + K(z − h(x̂⁻)),   P = (I − K H) P⁻

The covariance ellipse is the visual signature of a Kalman filter: it grows when the robot moves (uncertainty accumulating) and shrinks when a measurement comes in (information gained).

Demo 2 — EKF localization with range measurements to landmarks
The robot drives a known trajectory (gray dashed). The red ellipse is the 2σ covariance of the EKF's estimate. Whenever a landmark falls within sensor range, the filter performs a range-only Kalman update — innovations shown as thin red lines. Watch the ellipse expand on prediction and shrink on update.
True trajectory EKF estimate 2σ covariance Landmarks (RFID-like)
step 0 tr(P) = position error updates 0

EKF localization is fast and composes naturally with multi-sensor fusion. It's what you find inside most production GPS+IMU+wheel-odometry stacks.

EKF vs UKF

The Unscented Kalman Filter replaces the EKF's Jacobian linearization with a deterministic set of sigma points propagated through the true nonlinear functions. The UKF uses a deterministic sampling approach to capture the mean and covariance of the state distribution, unlike the EKF which linearizes the nonlinear functions using Jacobians. UKF is usually slightly more accurate when motion or measurement models are strongly nonlinear; EKF is simpler to implement when you can write the Jacobians cleanly. Both are in PythonRobotics.

Method 3 — Histogram filter localization grid belief

The most literal Bayesian filter: discretize the world into cells, store one probability per cell, and update them all at every step. No assumptions about Gaussianity, no resampling — just multiplication and renormalization. It's slow for large worlds but conceptually the cleanest.

Demo 3 — Histogram filter localization
The world is divided into a grid. Each cell stores P(robot is here). Initially flat (no idea). Predict shifts the distribution by commanded motion and blurs it (motion noise). Update multiplies by the likelihood of the observed range to landmarks. Watch the heatmap concentrate.
High probability Low probability True robot pose Landmarks
step 0 entropy ≈ peak probability

The histogram filter handles the "I have no idea where I am" case naturally — start uniform and let the math do its work. Note how multimodality is handled: when only one landmark is in range, the belief has a circular ridge of equally-likely positions; a second observation breaks the symmetry.

Methods at a glance

MethodBelief representationPer-step costBest for…
EKF / UKFSingle GaussianO(n²) – O(n³)Tracking when the prior is good and the world is roughly linear.
Ensemble Kalman FilterSample-approximated GaussianO(N·n)Very high-dimensional state where full covariance is intractable.
Histogram filterProbability per grid cellO(cells)Small worlds, global localization, multimodal beliefs.
Particle filter (MCL)Weighted samplesO(N·sensors)Arbitrary distributions, kidnapped robot, long-term recovery.
Scan matching (ICP, NDT)No distribution; pure optimizationO(K·iterations)You already know roughly where you are and you have a lidar scan.

Use cases

Practical notes

Watch out for

Symmetric environments. Long featureless corridors, square rooms, and warehouse aisles produce multiple equally-good hypotheses. The filter will not converge. Add fiducials, use ceiling features, or bias the initial pose if you can.

Linearization breakdown (EKF/UKF). If your covariance grows large, the linear approximation around the mean stops being valid. Either reduce your update period, switch to UKF, or add a global recovery layer (PF on top of EKF, AMCL-style).

Sample impoverishment (PF). Resampling every step kills off particle diversity. Inject a small fraction of random particles each step, or only resample when effective sample size drops below a threshold.

The kidnapped robot problem. Pure EKF cannot recover from being moved without warning. MCL with random injection or a histogram filter that re-sees the world fresh will. Plan for this if your robot can be picked up or slipped.

3 — Mapping: "What's around me?"

Now the opposite: assume you know exactly where you are at every moment (perfect GPS, motion capture, wheel encoders with no slip). Build a map of what you observe.

The intuition

You're a surveyor with a known position at every step. You shine a laser at the world, it bounces back, and you get a distance. Multiply by direction and you have a 3D point. Do this thousands of times per second from a moving platform and you have a point cloud of the world.

Point clouds aren't enough for planning. The robot wants to know not just "there's a wall here" but also "this region is empty and safe to drive through." That's what occupancy grid maps give you: every cell stores a probability that it is occupied.

Demo 4 — Building an occupancy grid with a lidar
Click anywhere in the map to teleport the robot. The robot fires lidar rays in all directions. Cells that rays pass through become free (light). Cells that rays hit become occupied (dark). Drive around to fill in the map.
Occupied (hit) Free (cleared) Unknown Robot
cells observed 0 scans taken 0 coverage 0%

What you're watching is the textbook occupancy grid update from Moravec & Elfes (1985). Each cell stores a log-odds value:

ℓ(c) = log [ p(c is occupied) / p(c is free) ]

When a ray endpoint lands on a cell, that cell's log-odds goes up. When a ray passes through a cell, log-odds goes down. Adding log-odds is equivalent to multiplying probabilities, which is what makes this update so fast.

Methods at a glance

Map typeWhat it storesBest for
Occupancy gridP(occupied) per cell2D navigation, planning, vacuum robots
Gaussian grid mapGaussian per cell aggregating raw range hitsSmoother surfaces; sensor noise modeling
NDT mapOne Gaussian per voxel summarizing point distributionCompact registration target for NDT scan matching
Voxel / OctoMapP(occupied) per 3D voxel (sparse octree)3D navigation, drones, manipulators
Distance mapDistance to nearest obstacle per cellPath planning costs, obstacle avoidance fields
Point cloudList of 3D pointsVisualization, ICP, surveying
Feature / landmark mapList of (id, 3D position)Sparse SLAM, AR anchors
Object-shape mapFitted primitives: lines, circles, rectanglesCompact representation, semantic reasoning
Topological mapGraph of places + edgesHigh-level planning, biology-inspired navigation
Signed Distance Field (TSDF)Distance to nearest surface per voxel3D reconstruction (KinectFusion-style)
Neural fields (NeRF, Gaussian Splat)Implicit / volumetric scene representationPhotorealistic reconstruction, novel views
Beyond grids

Once you have raw lidar points, mapping often involves extracting structure rather than dumping points into cells. PythonRobotics shows several approaches: k-means clustering to group lidar returns into objects, circle fitting for round obstacles like trees and pillars, and rectangle fitting for cars and walls. The output is a much smaller, more semantically useful map than a dense grid.

Use cases

Practical notes

Watch out for

Resolution vs memory. A 5 cm grid over a 100 m × 100 m warehouse is 4 million cells. A 1 cm grid is 100 million. 3D voxel grids blow up cubically. Production systems use sparse structures (octrees, hash tables) instead of dense arrays.

Dynamic objects pollute the map. If a person walks in front of the lidar while you're mapping, you'll record them as a wall. Real systems either filter dynamic returns (semantic segmentation), or use a slow forgetting factor so transient occupancy decays away.

Sensor model matters more than you think. Glass walls, mirrors, and dark fabrics confuse lidars. Sunlight saturates structured-light depth cameras. Map quality is bounded by how well you've modeled your sensor's failure modes.

4 — SLAM: "Both at once."

Now the hard one. The robot has neither a map nor a known pose. It has to figure out both, simultaneously, from raw sensor data.

Why it's so much harder

It's a chicken-and-egg problem. To localize, I need a map. To map, I need to know where I am. The only way out is to estimate them jointly: every pose helps build the map, and every piece of map helps refine past poses.

Modern SLAM splits cleanly into two halves:

We'll look at one canonical front-end algorithm (ICP), then one classical back end (EKF-SLAM), and finally what loop closure does to a long trajectory.

Front end — Iterative Closest Point (ICP) scan matching

ICP is the workhorse of every lidar-based SLAM front end. Given two point clouds — the previous scan and the current scan — find the rigid transform (rotation + translation) that best aligns them. Repeat once per frame, accumulate, and you have lidar odometry.

The algorithm is alternating minimization:

  1. For each point in the moving cloud, find the nearest point in the fixed cloud (the correspondence).
  2. Solve a closed-form least-squares problem for the rotation R and translation t that minimize the sum of squared distances under those correspondences.
  3. Apply (R, t) to the moving cloud. Repeat until convergence.
Demo 5 — ICP scan matching
Two point clouds of the same scene, captured from slightly different poses, start misaligned (red) against a reference (gray). Each iteration finds nearest-point correspondences (thin lines) and solves for the best rigid transform. Step through to watch the alignment converge.
Reference cloud (fixed) Moving cloud Correspondences
iteration 0 mean residual Δrotation

The optimal rotation between two point sets has a beautiful closed form. Let H be the cross-covariance matrix of the centered correspondences. Then the SVD gives H = U Σ Vᵀ, and the optimal rotation is R = V Uᵀ. In 2D, this collapses to a single arctan:

θ* = atan2(Σ Hₓᵧ − Σ Hᵧₓ, Σ Hₓₓ + Σ Hᵧᵧ)

That's the formula running in the demo. The translation is then the difference between the two centroids after rotation. Variants like point-to-plane ICP, NDT, and generalized ICP use the same loop with smarter cost functions.

Back end — EKF-SLAM filter-based

Now the joint estimation. Stack the robot pose and every landmark position into one giant state vector, with one giant covariance matrix capturing every correlation between them. The EKF runs on that joint state, treating each landmark observation as a constraint between robot and landmark.

state x = [ rₓ, rᵧ,   lm₁ₓ, lm₁ᵧ,   lm₂ₓ, lm₂ᵧ,   … ]

The crucial property is that the covariance is not block-diagonal. Robot and landmarks become correlated as soon as the robot observes them. That correlation is what makes loop closure work: when the robot re-observes an old landmark, the constraint propagates through the whole covariance and drags the entire trajectory back into shape.

Demo 6 — EKF-SLAM with explicit landmark estimation
The robot drives the gray dashed loop. It has noisy odometry (so the red estimated path drifts) and observes nearby landmarks with noisy relative-position measurements. New landmarks are initialized into the state with high uncertainty (large blue ellipses) and shrink each time they're re-observed. Watch the loop closure when the robot returns to landmarks it saw at the start.
True path SLAM estimate Robot covariance True landmark Landmark covariance
step 0 landmarks tracked 0 state size 2 position error

EKF-SLAM has a famous flaw: the covariance matrix is dense once landmarks become correlated, so each update is O(n²) in the number of landmarks. Above a few hundred landmarks it becomes prohibitively slow. This is why the field moved on.

FastSLAM 1.0 and 2.0 particle filter

FastSLAM exploits a clever factorization. Conditional on the robot's trajectory, the landmarks become independent of one another. So you can use a particle filter for the trajectory (handling its non-Gaussian uncertainty) and a small EKF per landmark per particle for the landmark positions. FastSLAM is a feature-based SLAM example where the blue line is ground truth, the black line is dead reckoning, the red line is the estimated trajectory, the red points are particles, black points are landmarks, and blue crosses are estimated landmark positions. The 2.0 variant improves the proposal distribution to keep particle diversity over long runs.

Graph-based SLAM optimization

The modern formulation. Forget the covariance matrix. Build a graph where nodes are robot poses (and optionally landmarks) and edges are constraints (odometry, observations, loop closures). Find the configuration of the graph that best satisfies all constraints in the least-squares sense:

x* = argmin Σᵢⱼ ‖zᵢⱼ − h(xᵢ, xⱼ)‖²_Ωᵢⱼ

This is a sparse nonlinear least-squares problem. Sparsity (each pose only connects to a few neighbors) makes it tractable for tens of thousands of poses. Solvers like g²o, GTSAM, and Ceres are the standard back ends. Almost every modern SLAM system — ORB-SLAM, Cartographer, Kimera, VINS — runs a graph back end.

The conceptual loop-closure picture

Before the methods table, here's the shape of the problem. Drift accumulates linearly with distance. Loop closure is the single observation that yanks the trajectory back into place.

Demo 7 — Drift, then snap back: a high-level view of loop closure
Pure conceptual visualization. Dashed gray is the truth, solid red is what SLAM thinks happened. Without loop closure the estimate peels off; with loop closure on, recognizing the start point yanks the whole trajectory back. This is what graph-SLAM optimization does in one frame.
True path SLAM estimate Pose uncertainty Landmarks
step 0 position error 0.00 m loop closures 0

Methods at a glance

ApproachWhat it representsStrengths / weaknesses
EKF-SLAMOne Gaussian over (robot, all landmarks)Conceptually clean. O(n²) per update kills it above ~10³ landmarks.
FastSLAM 1.0Particles over robot trajectory + EKF per landmarkScales to many landmarks. Particle depletion on long runs is real.
FastSLAM 2.0Same, with smarter proposal distributionRobust to high motion/observation noise; standard for textbook PF SLAM.
Graph SLAM (g²o, GTSAM, Ceres)Pose graph + sparse least-squaresWhat everyone uses now. Scales to 10⁵+ poses; modular.
Visual SLAM (ORB-SLAM3, DSO, LSD-SLAM)Bundle adjustment on visual features or pixelsJust a camera; sensitive to motion blur, low texture, illumination.
Lidar SLAM (Cartographer, LOAM, KISS-ICP)Scan-to-scan + scan-to-map matching, pose graph back endRobust and accurate; lidars are expensive.
Visual-Inertial SLAM (VINS, OKVIS, ORB-SLAM3)Tightly couples camera + IMUState of the art for drones, AR/VR; metric scale comes for free.

Use cases

Practical notes

Watch out for

Drift is inevitable; the only fix is loop closure. Plan for closures: drive in loops, reuse known anchors, or fuse GPS when available.

False loop closures kill SLAM dead. Wrongly matching the current scene to a similar-looking part of the map will pull your trajectory into a horribly wrong shape. Robust back-end methods (switchable constraints, robust kernels, max-mixtures) exist exactly to reject these.

ICP fails when initialization is bad. The nearest-point heuristic gets stuck in local minima. Always seed ICP with the best estimate you have (odometry, IMU integration, prior frame). For very large displacements, use a global registration method first (e.g., FPFH features + RANSAC).

Initialization is fragile. Pure visual SLAM cannot recover scale from a single frame. The IMU helps, but you need a few seconds of motion before the system stabilizes.

Dynamic environments are still hard. Most SLAM systems assume the world is static. Crowds, traffic, and weather break that assumption; semantic segmentation to mask out moving objects is now standard in production stacks.

5 — Loop closure: the trick that saves SLAM

It's worth dwelling on loop closure because it's the single most important idea in modern SLAM and the one most outsiders miss.

Imagine walking around a city blindfolded, counting your steps. After two hours you'll have a guess at where you are, but your error in step count and turn angle has compounded. Now suppose halfway through, you take off the blindfold for one second and recognize a fountain you passed earlier. That single recognition doesn't just tell you where you are now — it tells you that your path between the two visits forms a closed loop, which means you can redistribute the error you've accumulated.

That's loop closure. The optimization step that pushes the error backward through the trajectory is called pose-graph optimization. Toggle it off in Demo 7 and you'll see exactly the error compounding I described.

A useful mental model

Think of the trajectory as a chain of rubber-band edges between poses. Odometry gives a soft constraint between consecutive poses ("step ~1 m, ~5° turn"). Loop closure adds a single rigid edge between two poses far apart in time. The optimizer relaxes the whole chain to minimize tension. The longer the loop, the more dramatically the chain reshapes.

How systems actually detect loop closures

6 — Side by side

LocalizationMappingSLAM
What's knownMapPoseNeither
What's estimatedPoseMapBoth, jointly
DifficultyEasy–mediumEasy if pose is goodHard
Failure modeWrong initial pose, perceptual aliasingPose drift corrupts the mapDrift + bad data association
Canonical algorithmMCL / EKF / Histogram filterOccupancy gridPose-graph optimization with ICP front end
Key trickMultimodal beliefsLog-odds updateLoop closure
Where you've used itVacuum returning to dockFirst mapping run of a vacuumPhone AR session

7 — What a real SLAM stack looks like

If you peel open a modern visual-inertial SLAM system (say ORB-SLAM3 or a typical AR runtime), you'll find a pipeline that looks roughly like this:

[ camera + IMU ]
    ↓
Front end  — feature detection (ORB, FAST), descriptor matching, IMU pre-integration
    ↓  (matched 2D–3D correspondences)
Tracking thread  — pose at frame rate via PnP / motion-only BA
    ↓  (keyframes, every ~10 frames)
Local mapping thread  — add new map points, local bundle adjustment
    ↓
Loop & map merging thread  — DBoW place recognition, pose-graph optimization, full BA
    ↓
[ globally consistent map + 6-DoF pose at 30+ Hz ]

The pattern — fast front-end loop, slow back-end optimization, occasional global correction — is universal. Replace the camera with a lidar (Cartographer, LIO-SAM) and the same architecture applies. Replace it with wheel encoders and beacon tags and you're back at warehouse-robot localization.

8 — References & further reading

Code & tutorials (start here)

  1. Sakai, A. et al. PythonRobotics — Python sample codes for almost every algorithm in this primer. Covers EKF / Ensemble / Unscented Kalman / Histogram / Particle filter localization; Gaussian-grid, NDT, ray-casting, lidar-to-grid, k-means clustering, circle/rectangle fitting, and Distance Map for mapping; ICP, EKF-SLAM, FastSLAM 1.0/2.0, and Graph-based SLAM. The most accessible reference codebase in the field.
  2. Sakai, A., Ingram, D., et al. "PythonRobotics: a Python code collection of robotics algorithms." arXiv:1808.10703 (2018). The companion paper.

Books

  1. Thrun, S., Burgard, W., & Fox, D. Probabilistic Robotics. MIT Press, 2005. — The canonical textbook.
  2. Barfoot, T. State Estimation for Robotics. Cambridge University Press, 2017. — A more modern, math-forward treatment with strong factor-graph emphasis.
  3. Siegwart, R., Nourbakhsh, I., & Scaramuzza, D. Introduction to Autonomous Mobile Robots. 2nd ed., MIT Press, 2011.

Foundational papers

  1. Smith, R., Self, M., & Cheeseman, P. "On the Representation and Estimation of Spatial Uncertainty." IJRR, 1986. — The paper that started it all.
  2. Moravec, H. & Elfes, A. "High Resolution Maps from Wide Angle Sonar." ICRA, 1985. — The original occupancy grid.
  3. Besl, P. & McKay, N. "A Method for Registration of 3-D Shapes." IEEE TPAMI, 1992. — The original ICP paper.
  4. Montemerlo, M., et al. "FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem." AAAI, 2002.
  5. Dellaert, F. & Kaess, M. "Factor Graphs for Robot Perception." Foundations and Trends in Robotics, 2017.
  6. Cadena, C., et al. "Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age." IEEE T-RO, 2016.
  7. Durrant-Whyte, H. & Bailey, T. "Simultaneous Localization and Mapping: Part I & II." IEEE RAM, 2006.
  8. Julier, S. & Uhlmann, J. "A New Extension of the Kalman Filter to Nonlinear Systems." SPIE, 1997. — The UKF paper.

Influential systems

  1. Davison, A. J. "MonoSLAM: Real-Time Single Camera SLAM." IEEE TPAMI, 2007.
  2. Mur-Artal, R. & Tardós, J. D. "ORB-SLAM2 / ORB-SLAM3." IEEE T-RO, 2017 / 2021.
  3. Hess, W., et al. "Real-Time Loop Closure in 2D LIDAR SLAM." ICRA, 2016. — Cartographer.
  4. Zhang, J. & Singh, S. "LOAM: Lidar Odometry and Mapping in Real-time." RSS, 2014.
  5. Qin, T., Li, P., & Shen, S. "VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator." IEEE T-RO, 2018.
  6. Vizzo, I., et al. "KISS-ICP: In Defense of Point-to-Point ICP." IEEE RA-L, 2023.

Online courses and lectures

  1. Cyrill Stachniss — SLAM and Mobile Robotics lecture series on YouTube.
  2. Sebastian Thrun — Udacity "Artificial Intelligence for Robotics" (CS373).
  3. Daniel Cremers — TUM "Computer Vision II: Multiple View Geometry" lectures.
  4. Davide Scaramuzza — UZH Robotics & Perception lectures.

Open-source codebases worth reading

  1. GTSAM (Georgia Tech) — factor-graph back end used by many systems.
  2. g²o — the original general-purpose graph optimizer.
  3. Ceres Solver (Google) — generic nonlinear least squares; the optimizer behind a lot of SLAM and bundle adjustment.
  4. ORB-SLAM3 — clean, well-documented C++ reference for visual-inertial SLAM.
  5. Cartographer — production-grade 2D/3D lidar SLAM.
  6. OctoMap — the standard 3D occupancy mapping library.
  7. Open3D — modern point-cloud processing including ICP variants.