Topic · 06

Reinforcement Learning

Agents, rewards, policies. Where control theory and ML meet.

Jun 11, 2026

Imitation Learning — Learning by Watching

Imitation learning is the part of the machine-learning toolkit closest to how humans actually learn most skills — not from a numerical reward, but from watching someone competen...

RL
Jun 8, 2026

Reinforcement Learning in Agentic AI

Reinforcement learning shows up inside modern AI agents in two very different places — pre-deployment as a training method that shapes the base model, and at run-time as the pol...

RL
Apr 20, 2026

Practical RL Engineering: Hyperparameters, Debugging, and Silent Failures

The most pragmatic post in the field guide — the hyperparameters that almost always work, the order to debug a policy that won't learn, and the silent failures that cost days be...

RL
Apr 20, 2026

Robotics in Practice: Sim-to-Real, Offline RL, and Safe RL

Algorithms are maybe 30% of a successful RL robotics project. The other 70% is engineering: reward design, observation and action spaces, sim-to-real, learning from logged data,...

RL
Apr 20, 2026

Model-Based RL and MPC Hybrids

If you're a control engineer, this is the RL section written for you. Model-based RL learns dynamics and plans against them — 100× more sample-efficient than model-free on real ...

RL
Apr 20, 2026

Exploration and Modern Deep RL: SAC, PPO, TD3, DDPG

Every RL algorithm is secretly two algorithms sharing a body — one that exploits what it knows, one that explores for what it doesn't. This post covers the exploration methods t...

RL
Apr 20, 2026

Policy Gradients and Actor-Critic

From REINFORCE to PPO in one post — the policy gradient theorem, why its variance is ruinous by default, and the three tricks (baselines, critics, GAE) that make it work in prac...

RL
Apr 20, 2026

Temporal Difference Learning, SARSA, and Q-Learning

The Bellman equations say what a value function satisfies; temporal-difference learning says how to estimate it from samples. One update rule, two algorithms (SARSA and Q-learni...

RL
Apr 20, 2026

Foundations: MDPs, Value Functions, and the Bellman Equations

Every RL algorithm starts from the same place: write down the MDP. This post covers the formalism a control engineer actually needs — state, action, reward, the discount factor,...

RL
Apr 20, 2026

Reinforcement Learning for Control & Robotics — a Field Guide

A practitioner-oriented field guide to reinforcement learning for control and robotics. Start here: the RL family tree, a five-question algorithm selector, and the rules of thum...

RL