Robotics · A Visual Primer

Localization, Mapping & SLAM

Where am I? What's around me? And how do I figure out both at the same time?

By Majid Mazouchi

1 — Three questions, three problems

A robot moving through space — a vacuum, a forklift, a drone, a rover, an AR headset on your face — almost always needs to answer some combination of three deceptively simple questions:

Where am I? — given a map, figure out my pose in it.
What's around me? — given that I know where I am, build a map.
Both at once? — given neither, figure out both simultaneously.

These map onto three classical problems: Localization, Mapping, and SLAM. They look related, and they are, but the third is hilariously harder than the other two combined. This page works through all three with seven interactive demos so you can see how the algorithms behave, fail, and recover.

The unifying idea

All three problems are solved roughly the same way: maintain a probability distribution over what you don't know (pose, map, or both), and update it whenever you (a) move or (b) sense something. This is recursive Bayesian estimation. The rest is engineering: which distribution, which approximation, which sensor.

Reference implementation

The methods covered here are all implemented in AtsushiSakai/PythonRobotics, a widely-used open-source collection of Python sample code for robotics algorithms. Where we name a specific algorithm, you can find a runnable Python version in that repo. We borrow its taxonomy of methods throughout this primer.

2 — Localization: "Where am I?"

You hand the robot a map and switch it on somewhere on that map. It doesn't know where. As it drives and senses, it has to figure out which point on the map matches what it's seeing.

The intuition

Imagine being teleported, blindfolded, into an apartment whose floor plan you've memorized. You take off the blindfold. You see a kitchen counter to your left and a window to your right. There might be three rooms in the apartment that match that description — so you're not sure yet. You take a step forward and now you see a hallway. Only one room had a hallway exit in that direction. Now you know where you are.

That's exactly how a localization algorithm works. It starts with many hypotheses ("I could be anywhere"), and as observations arrive, the hypotheses that don't match get killed off until only the right ones survive. The four classical methods differ in how they represent those hypotheses:

Particles — a swarm of weighted samples (Monte Carlo Localization).
A Gaussian — a single mean and covariance (Kalman filter family).
A grid — one probability per cell (histogram filter).
A scan match — no probabilities, just optimization (ICP / NDT-based).

The first three are below as live demos. The fourth shows up in §4 because it's also the workhorse of SLAM front ends.

Method 1 — Monte Carlo Localization (MCL) particle filter

The most flexible representation: maintain a swarm of weighted samples (particles) of where you could be. On every motion, each particle moves with motion noise. On every sensor observation, weight each particle by the likelihood of the observation given that particle's pose. Resample to keep the swarm focused on the high-likelihood regions.

Demo 1 — Monte Carlo Localization in a 1D corridor

A robot (green) drives along a known corridor with doors (blue) at known positions. It senses distance to the nearest door, with noise. The 300 red particles each represent one hypothesis. Watch them collapse onto the truth — and try the kidnap button to see what recovery looks like.

Sensor noise

True robot pose Doors (landmarks) Particle hypotheses

step 0 true x = — mean estimate = — std dev = —

Three things to notice:

At start, particles are uniform — the robot has no idea where it is.
After a few steps, particles cluster at multiple spots — anywhere consistent with the sensor history. This is perceptual aliasing: a single measurement cannot disambiguate similar-looking places.
After more motion, the wrong clusters die off. Hit Kidnap robot to teleport it elsewhere — most filters fail at this; MCL with random particle injection recovers.

Method 2 — Extended Kalman Filter (EKF) localization single Gaussian

If you trust your initial pose and your sensors aren't too crazy, you can summarize the entire belief with a single Gaussian: a mean (best guess) and a covariance matrix (how confident, in which direction). The EKF runs the same predict / update cycle as MCL, but instead of a swarm it carries the mean and covariance through linearized motion and observation models.

predict: x̂⁻ = f(x̂, u), P⁻ = F P Fᵀ + Q
update: K = P⁻ Hᵀ (H P⁻ Hᵀ + R)⁻¹, x̂ = x̂⁻ + K(z − h(x̂⁻)), P = (I − K H) P⁻

The covariance ellipse is the visual signature of a Kalman filter: it grows when the robot moves (uncertainty accumulating) and shrinks when a measurement comes in (information gained).

Demo 2 — EKF localization with range measurements to landmarks

The robot drives a known trajectory (gray dashed). The red ellipse is the 2σ covariance of the EKF's estimate. Whenever a landmark falls within sensor range, the filter performs a range-only Kalman update — innovations shown as thin red lines. Watch the ellipse expand on prediction and shrink on update.

Process noise Q Sensor noise R

True trajectory EKF estimate 2σ covariance Landmarks (RFID-like)

step 0 tr(P) = — position error — updates 0

EKF localization is fast and composes naturally with multi-sensor fusion. It's what you find inside most production GPS+IMU+wheel-odometry stacks.

EKF vs UKF

The Unscented Kalman Filter replaces the EKF's Jacobian linearization with a deterministic set of sigma points propagated through the true nonlinear functions. The UKF uses a deterministic sampling approach to capture the mean and covariance of the state distribution, unlike the EKF which linearizes the nonlinear functions using Jacobians. UKF is usually slightly more accurate when motion or measurement models are strongly nonlinear; EKF is simpler to implement when you can write the Jacobians cleanly. Both are in PythonRobotics.

Method 3 — Histogram filter localization grid belief

The most literal Bayesian filter: discretize the world into cells, store one probability per cell, and update them all at every step. No assumptions about Gaussianity, no resampling — just multiplication and renormalization. It's slow for large worlds but conceptually the cleanest.

Demo 3 — Histogram filter localization

The world is divided into a grid. Each cell stores P(robot is here). Initially flat (no idea). Predict shifts the distribution by commanded motion and blurs it (motion noise). Update multiplies by the likelihood of the observed range to landmarks. Watch the heatmap concentrate.

Sensor noise

High probability Low probability True robot pose Landmarks

step 0 entropy ≈ — peak probability —

The histogram filter handles the "I have no idea where I am" case naturally — start uniform and let the math do its work. Note how multimodality is handled: when only one landmark is in range, the belief has a circular ridge of equally-likely positions; a second observation breaks the symmetry.

Methods at a glance

Method	Belief representation	Per-step cost	Best for…
EKF / UKF	Single Gaussian	O(n²) – O(n³)	Tracking when the prior is good and the world is roughly linear.
Ensemble Kalman Filter	Sample-approximated Gaussian	O(N·n)	Very high-dimensional state where full covariance is intractable.
Histogram filter	Probability per grid cell	O(cells)	Small worlds, global localization, multimodal beliefs.
Particle filter (MCL)	Weighted samples	O(N·sensors)	Arbitrary distributions, kidnapped robot, long-term recovery.
Scan matching (ICP, NDT)	No distribution; pure optimization	O(K·iterations)	You already know roughly where you are and you have a lidar scan.

Use cases

Warehouse robots (Amazon Kiva-style fleets) localize against a known floor plan and fiducial markers.
Self-driving cars localize against pre-built HD maps using lidar + GPS + IMU; centimeter accuracy is the bar.
Robot vacuums localize against a saved house map so they can resume cleaning where they stopped.
AR/VR headsets localize against a scene model the device built earlier and re-uses across sessions.
Indoor delivery and service robots use Adaptive MCL (the AMCL package in ROS is the canonical example) over occupancy grids.

Practical notes

Watch out for

Symmetric environments. Long featureless corridors, square rooms, and warehouse aisles produce multiple equally-good hypotheses. The filter will not converge. Add fiducials, use ceiling features, or bias the initial pose if you can.

Linearization breakdown (EKF/UKF). If your covariance grows large, the linear approximation around the mean stops being valid. Either reduce your update period, switch to UKF, or add a global recovery layer (PF on top of EKF, AMCL-style).

Sample impoverishment (PF). Resampling every step kills off particle diversity. Inject a small fraction of random particles each step, or only resample when effective sample size drops below a threshold.

The kidnapped robot problem. Pure EKF cannot recover from being moved without warning. MCL with random injection or a histogram filter that re-sees the world fresh will. Plan for this if your robot can be picked up or slipped.

3 — Mapping: "What's around me?"

Now the opposite: assume you know exactly where you are at every moment (perfect GPS, motion capture, wheel encoders with no slip). Build a map of what you observe.

The intuition

You're a surveyor with a known position at every step. You shine a laser at the world, it bounces back, and you get a distance. Multiply by direction and you have a 3D point. Do this thousands of times per second from a moving platform and you have a point cloud of the world.

Point clouds aren't enough for planning. The robot wants to know not just "there's a wall here" but also "this region is empty and safe to drive through." That's what occupancy grid maps give you: every cell stores a probability that it is occupied.

Demo 4 — Building an occupancy grid with a lidar

Click anywhere in the map to teleport the robot. The robot fires lidar rays in all directions. Cells that rays pass through become free (light). Cells that rays hit become occupied (dark). Drive around to fill in the map.

show probability Sensor range

Occupied (hit) Free (cleared) Unknown Robot

cells observed 0 scans taken 0 coverage 0%

What you're watching is the textbook occupancy grid update from Moravec & Elfes (1985). Each cell stores a log-odds value:

ℓ(c) = log [ p(c is occupied) / p(c is free) ]

When a ray endpoint lands on a cell, that cell's log-odds goes up. When a ray passes through a cell, log-odds goes down. Adding log-odds is equivalent to multiplying probabilities, which is what makes this update so fast.

Methods at a glance

Map type	What it stores	Best for
Occupancy grid	P(occupied) per cell	2D navigation, planning, vacuum robots
Gaussian grid map	Gaussian per cell aggregating raw range hits	Smoother surfaces; sensor noise modeling
NDT map	One Gaussian per voxel summarizing point distribution	Compact registration target for NDT scan matching
Voxel / OctoMap	P(occupied) per 3D voxel (sparse octree)	3D navigation, drones, manipulators
Distance map	Distance to nearest obstacle per cell	Path planning costs, obstacle avoidance fields
Point cloud	List of 3D points	Visualization, ICP, surveying
Feature / landmark map	List of (id, 3D position)	Sparse SLAM, AR anchors
Object-shape map	Fitted primitives: lines, circles, rectangles	Compact representation, semantic reasoning
Topological map	Graph of places + edges	High-level planning, biology-inspired navigation
Signed Distance Field (TSDF)	Distance to nearest surface per voxel	3D reconstruction (KinectFusion-style)
Neural fields (NeRF, Gaussian Splat)	Implicit / volumetric scene representation	Photorealistic reconstruction, novel views

Beyond grids

Once you have raw lidar points, mapping often involves extracting structure rather than dumping points into cells. PythonRobotics shows several approaches: k-means clustering to group lidar returns into objects, circle fitting for round obstacles like trees and pillars, and rectangle fitting for cars and walls. The output is a much smaller, more semantically useful map than a dense grid.

Use cases

Indoor floor plans. Robot vacuums and warehouse robots produce occupancy grids you can literally screenshot and use as a blueprint.
Mining and construction surveys. Mobile mapping rigs walk through tunnels and produce point clouds with sub-centimeter accuracy.
Agriculture. Tractors with RTK-GPS map fields and crop heights for variable-rate fertilizer application.
3D reconstruction for film and games. Drone photogrammetry rebuilds entire neighborhoods.
Autonomous-vehicle HD maps. Pre-recorded lidar maps with lane geometry, sign positions, and traffic-light locations.

Practical notes

Watch out for

Resolution vs memory. A 5 cm grid over a 100 m × 100 m warehouse is 4 million cells. A 1 cm grid is 100 million. 3D voxel grids blow up cubically. Production systems use sparse structures (octrees, hash tables) instead of dense arrays.

Dynamic objects pollute the map. If a person walks in front of the lidar while you're mapping, you'll record them as a wall. Real systems either filter dynamic returns (semantic segmentation), or use a slow forgetting factor so transient occupancy decays away.

Sensor model matters more than you think. Glass walls, mirrors, and dark fabrics confuse lidars. Sunlight saturates structured-light depth cameras. Map quality is bounded by how well you've modeled your sensor's failure modes.

4 — SLAM: "Both at once."

Now the hard one. The robot has neither a map nor a known pose. It has to figure out both, simultaneously, from raw sensor data.

Why it's so much harder

It's a chicken-and-egg problem. To localize, I need a map. To map, I need to know where I am. The only way out is to estimate them jointly: every pose helps build the map, and every piece of map helps refine past poses.

Modern SLAM splits cleanly into two halves:

The front end turns raw sensor data into observations: feature detection, scan matching, descriptor matching, RANSAC.
The back end takes those observations and produces a globally consistent estimate: filter-based or graph-based.

We'll look at one canonical front-end algorithm (ICP), then one classical back end (EKF-SLAM), and finally what loop closure does to a long trajectory.

Front end — Iterative Closest Point (ICP) scan matching

ICP is the workhorse of every lidar-based SLAM front end. Given two point clouds — the previous scan and the current scan — find the rigid transform (rotation + translation) that best aligns them. Repeat once per frame, accumulate, and you have lidar odometry.

The algorithm is alternating minimization:

For each point in the moving cloud, find the nearest point in the fixed cloud (the correspondence).
Solve a closed-form least-squares problem for the rotation R and translation t that minimize the sum of squared distances under those correspondences.
Apply (R, t) to the moving cloud. Repeat until convergence.

Demo 5 — ICP scan matching

Two point clouds of the same scene, captured from slightly different poses, start misaligned (red) against a reference (gray). Each iteration finds nearest-point correspondences (thin lines) and solves for the best rigid transform. Step through to watch the alignment converge.

Reference cloud (fixed) Moving cloud Correspondences

iteration 0 mean residual — Δrotation —

The optimal rotation between two point sets has a beautiful closed form. Let H be the cross-covariance matrix of the centered correspondences. Then the SVD gives H = U Σ Vᵀ, and the optimal rotation is R = V Uᵀ. In 2D, this collapses to a single arctan:

θ* = atan2(Σ Hₓᵧ − Σ Hᵧₓ, Σ Hₓₓ + Σ Hᵧᵧ)

That's the formula running in the demo. The translation is then the difference between the two centroids after rotation. Variants like point-to-plane ICP, NDT, and generalized ICP use the same loop with smarter cost functions.

Back end — EKF-SLAM filter-based

Now the joint estimation. Stack the robot pose and every landmark position into one giant state vector, with one giant covariance matrix capturing every correlation between them. The EKF runs on that joint state, treating each landmark observation as a constraint between robot and landmark.

state x = [ rₓ, rᵧ, lm₁ₓ, lm₁ᵧ, lm₂ₓ, lm₂ᵧ, … ]

The crucial property is that the covariance is not block-diagonal. Robot and landmarks become correlated as soon as the robot observes them. That correlation is what makes loop closure work: when the robot re-observes an old landmark, the constraint propagates through the whole covariance and drags the entire trajectory back into shape.

Demo 6 — EKF-SLAM with explicit landmark estimation

The robot drives the gray dashed loop. It has noisy odometry (so the red estimated path drifts) and observes nearby landmarks with noisy relative-position measurements. New landmarks are initialized into the state with high uncertainty (large blue ellipses) and shrink each time they're re-observed. Watch the loop closure when the robot returns to landmarks it saw at the start.

Sensor range

True path SLAM estimate Robot covariance True landmark Landmark covariance

step 0 landmarks tracked 0 state size 2 position error —

EKF-SLAM has a famous flaw: the covariance matrix is dense once landmarks become correlated, so each update is O(n²) in the number of landmarks. Above a few hundred landmarks it becomes prohibitively slow. This is why the field moved on.

FastSLAM 1.0 and 2.0 particle filter

FastSLAM exploits a clever factorization. Conditional on the robot's trajectory, the landmarks become independent of one another. So you can use a particle filter for the trajectory (handling its non-Gaussian uncertainty) and a small EKF per landmark per particle for the landmark positions. FastSLAM is a feature-based SLAM example where the blue line is ground truth, the black line is dead reckoning, the red line is the estimated trajectory, the red points are particles, black points are landmarks, and blue crosses are estimated landmark positions. The 2.0 variant improves the proposal distribution to keep particle diversity over long runs.

Graph-based SLAM optimization

The modern formulation. Forget the covariance matrix. Build a graph where nodes are robot poses (and optionally landmarks) and edges are constraints (odometry, observations, loop closures). Find the configuration of the graph that best satisfies all constraints in the least-squares sense:

x* = argmin Σᵢⱼ ‖zᵢⱼ − h(xᵢ, xⱼ)‖²_Ωᵢⱼ

This is a sparse nonlinear least-squares problem. Sparsity (each pose only connects to a few neighbors) makes it tractable for tens of thousands of poses. Solvers like g²o, GTSAM, and Ceres are the standard back ends. Almost every modern SLAM system — ORB-SLAM, Cartographer, Kimera, VINS — runs a graph back end.

The conceptual loop-closure picture

Before the methods table, here's the shape of the problem. Drift accumulates linearly with distance. Loop closure is the single observation that yanks the trajectory back into place.

Demo 7 — Drift, then snap back: a high-level view of loop closure

Pure conceptual visualization. Dashed gray is the truth, solid red is what SLAM thinks happened. Without loop closure the estimate peels off; with loop closure on, recognizing the start point yanks the whole trajectory back. This is what graph-SLAM optimization does in one frame.

Odometry noise

True path SLAM estimate Pose uncertainty Landmarks

step 0 position error 0.00 m loop closures 0

Methods at a glance

Approach	What it represents	Strengths / weaknesses
EKF-SLAM	One Gaussian over (robot, all landmarks)	Conceptually clean. O(n²) per update kills it above ~10³ landmarks.
FastSLAM 1.0	Particles over robot trajectory + EKF per landmark	Scales to many landmarks. Particle depletion on long runs is real.
FastSLAM 2.0	Same, with smarter proposal distribution	Robust to high motion/observation noise; standard for textbook PF SLAM.
Graph SLAM (g²o, GTSAM, Ceres)	Pose graph + sparse least-squares	What everyone uses now. Scales to 10⁵+ poses; modular.
Visual SLAM (ORB-SLAM3, DSO, LSD-SLAM)	Bundle adjustment on visual features or pixels	Just a camera; sensitive to motion blur, low texture, illumination.
Lidar SLAM (Cartographer, LOAM, KISS-ICP)	Scan-to-scan + scan-to-map matching, pose graph back end	Robust and accurate; lidars are expensive.
Visual-Inertial SLAM (VINS, OKVIS, ORB-SLAM3)	Tightly couples camera + IMU	State of the art for drones, AR/VR; metric scale comes for free.

Use cases

AR / VR. ARKit, ARCore, Quest, Vision Pro all run visual-inertial SLAM continuously to anchor virtual content.
Drones in GPS-denied environments. Indoor inspection, under-bridge surveys, search and rescue.
Self-driving in unmapped or stale-map areas. Fills the gaps between HD-map updates.
Planetary rovers. No GPS on Mars. Curiosity and Perseverance navigate with stereo visual odometry and SLAM-style optimization.
Robot vacuums and lawn mowers. Build the map on the first run, localize against it after.

Practical notes

Watch out for

Drift is inevitable; the only fix is loop closure. Plan for closures: drive in loops, reuse known anchors, or fuse GPS when available.

False loop closures kill SLAM dead. Wrongly matching the current scene to a similar-looking part of the map will pull your trajectory into a horribly wrong shape. Robust back-end methods (switchable constraints, robust kernels, max-mixtures) exist exactly to reject these.

ICP fails when initialization is bad. The nearest-point heuristic gets stuck in local minima. Always seed ICP with the best estimate you have (odometry, IMU integration, prior frame). For very large displacements, use a global registration method first (e.g., FPFH features + RANSAC).

Initialization is fragile. Pure visual SLAM cannot recover scale from a single frame. The IMU helps, but you need a few seconds of motion before the system stabilizes.

Dynamic environments are still hard. Most SLAM systems assume the world is static. Crowds, traffic, and weather break that assumption; semantic segmentation to mask out moving objects is now standard in production stacks.

5 — Loop closure: the trick that saves SLAM

It's worth dwelling on loop closure because it's the single most important idea in modern SLAM and the one most outsiders miss.

Imagine walking around a city blindfolded, counting your steps. After two hours you'll have a guess at where you are, but your error in step count and turn angle has compounded. Now suppose halfway through, you take off the blindfold for one second and recognize a fountain you passed earlier. That single recognition doesn't just tell you where you are now — it tells you that your path between the two visits forms a closed loop, which means you can redistribute the error you've accumulated.

That's loop closure. The optimization step that pushes the error backward through the trajectory is called pose-graph optimization. Toggle it off in Demo 7 and you'll see exactly the error compounding I described.

A useful mental model

Think of the trajectory as a chain of rubber-band edges between poses. Odometry gives a soft constraint between consecutive poses ("step ~1 m, ~5° turn"). Loop closure adds a single rigid edge between two poses far apart in time. The optimizer relaxes the whole chain to minimize tension. The longer the loop, the more dramatically the chain reshapes.

How systems actually detect loop closures

Bag-of-Words (DBoW2 / DBoW3) — the workhorse for visual SLAM. Each image is a histogram of visual-word occurrences from a pre-trained vocabulary. New frame is queried against all past keyframes; high similarity flags a candidate closure, which is then geometrically verified.
Scan-Context — for lidar SLAM. Convert each scan into a polar-coordinate fingerprint robust to viewpoint changes.
Learned descriptors — NetVLAD, MixVPR, AnyLoc. Increasingly common in production, especially for places with weather or lighting changes.
GPS / WiFi / known anchors — when available, basically free loop closures.

6 — Side by side

	Localization	Mapping	SLAM
What's known	Map	Pose	Neither
What's estimated	Pose	Map	Both, jointly
Difficulty	Easy–medium	Easy if pose is good	Hard
Failure mode	Wrong initial pose, perceptual aliasing	Pose drift corrupts the map	Drift + bad data association
Canonical algorithm	MCL / EKF / Histogram filter	Occupancy grid	Pose-graph optimization with ICP front end
Key trick	Multimodal beliefs	Log-odds update	Loop closure
Where you've used it	Vacuum returning to dock	First mapping run of a vacuum	Phone AR session

7 — What a real SLAM stack looks like

If you peel open a modern visual-inertial SLAM system (say ORB-SLAM3 or a typical AR runtime), you'll find a pipeline that looks roughly like this:

[ camera + IMU ]

    ↓

Front end  — feature detection (ORB, FAST), descriptor matching, IMU pre-integration

    ↓  (matched 2D–3D correspondences)

Tracking thread  — pose at frame rate via PnP / motion-only BA

    ↓  (keyframes, every ~10 frames)

Local mapping thread  — add new map points, local bundle adjustment

    ↓

Loop & map merging thread  — DBoW place recognition, pose-graph optimization, full BA

    ↓

[ globally consistent map + 6-DoF pose at 30+ Hz ]

The pattern — fast front-end loop, slow back-end optimization, occasional global correction — is universal. Replace the camera with a lidar (Cartographer, LIO-SAM) and the same architecture applies. Replace it with wheel encoders and beacon tags and you're back at warehouse-robot localization.

8 — References & further reading

Code & tutorials (start here)

Sakai, A. et al. PythonRobotics — Python sample codes for almost every algorithm in this primer. Covers EKF / Ensemble / Unscented Kalman / Histogram / Particle filter localization; Gaussian-grid, NDT, ray-casting, lidar-to-grid, k-means clustering, circle/rectangle fitting, and Distance Map for mapping; ICP, EKF-SLAM, FastSLAM 1.0/2.0, and Graph-based SLAM. The most accessible reference codebase in the field.
Sakai, A., Ingram, D., et al. "PythonRobotics: a Python code collection of robotics algorithms." arXiv:1808.10703 (2018). The companion paper.

Books

Thrun, S., Burgard, W., & Fox, D. Probabilistic Robotics. MIT Press, 2005. — The canonical textbook.
Barfoot, T. State Estimation for Robotics. Cambridge University Press, 2017. — A more modern, math-forward treatment with strong factor-graph emphasis.
Siegwart, R., Nourbakhsh, I., & Scaramuzza, D. Introduction to Autonomous Mobile Robots. 2nd ed., MIT Press, 2011.

Foundational papers

Smith, R., Self, M., & Cheeseman, P. "On the Representation and Estimation of Spatial Uncertainty." IJRR, 1986. — The paper that started it all.
Moravec, H. & Elfes, A. "High Resolution Maps from Wide Angle Sonar." ICRA, 1985. — The original occupancy grid.
Besl, P. & McKay, N. "A Method for Registration of 3-D Shapes." IEEE TPAMI, 1992. — The original ICP paper.
Montemerlo, M., et al. "FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem." AAAI, 2002.
Dellaert, F. & Kaess, M. "Factor Graphs for Robot Perception." Foundations and Trends in Robotics, 2017.
Cadena, C., et al. "Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age." IEEE T-RO, 2016.
Durrant-Whyte, H. & Bailey, T. "Simultaneous Localization and Mapping: Part I & II." IEEE RAM, 2006.
Julier, S. & Uhlmann, J. "A New Extension of the Kalman Filter to Nonlinear Systems." SPIE, 1997. — The UKF paper.

Influential systems

Davison, A. J. "MonoSLAM: Real-Time Single Camera SLAM." IEEE TPAMI, 2007.
Mur-Artal, R. & Tardós, J. D. "ORB-SLAM2 / ORB-SLAM3." IEEE T-RO, 2017 / 2021.
Hess, W., et al. "Real-Time Loop Closure in 2D LIDAR SLAM." ICRA, 2016. — Cartographer.
Zhang, J. & Singh, S. "LOAM: Lidar Odometry and Mapping in Real-time." RSS, 2014.
Qin, T., Li, P., & Shen, S. "VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator." IEEE T-RO, 2018.
Vizzo, I., et al. "KISS-ICP: In Defense of Point-to-Point ICP." IEEE RA-L, 2023.

Online courses and lectures

Cyrill Stachniss — SLAM and Mobile Robotics lecture series on YouTube.
Sebastian Thrun — Udacity "Artificial Intelligence for Robotics" (CS373).
Daniel Cremers — TUM "Computer Vision II: Multiple View Geometry" lectures.
Davide Scaramuzza — UZH Robotics & Perception lectures.

Open-source codebases worth reading

GTSAM (Georgia Tech) — factor-graph back end used by many systems.
g²o — the original general-purpose graph optimizer.
Ceres Solver (Google) — generic nonlinear least squares; the optimizer behind a lot of SLAM and bundle adjustment.
ORB-SLAM3 — clean, well-documented C++ reference for visual-inertial SLAM.
Cartographer — production-grade 2D/3D lidar SLAM.
OctoMap — the standard 3D occupancy mapping library.
Open3D — modern point-cloud processing including ICP variants.