Machines that
never stop
learning.
An illustrated field guide to continual learning, reinforcement learning and the science of reasoning — where benchmarks lie, networks lose plasticity, and the world is always bigger than the agent.
The Essays
The Benchmark Gap in Continual RL: From Continual World to SPIRAL
Five years of continual-RL benchmarks all measured the same wrong thing. The proof — and a map of what we should measure instead.
The Plasticity Crisis in Continual Deep Learning
Neural networks quietly lose the ability to learn. Inside the plasticity collapse that benchmarks never run long enough to see.
The Big World Hypothesis: Why Continual Learning Is Inevitable
If the world is bigger than the agent, continual learning is not a feature to add — it is mathematically inevitable.
GVFs as Proto-World-Models: The Alberta Plan Vindicated?
General Value Functions predicted the world-model era by a decade. Was the Alberta Plan right all along?
The Forgetting Transformer: When Architecture Solves Plasticity
What if the cure for plasticity loss isn't a regularizer but the architecture itself? The case of learned forgetting.
Does RL Teach LLMs to Reason, or Just Refine Them?
Does reinforcement learning teach language models to reason — or merely sharpen what pre-training already knew?
Shape of Thought: Why Reasoning Format Matters More Than Correctness
The format of a chain of thought may matter more than whether it lands on the right answer. The geometry of reasoning.
Stable Deep RL at Scale: Gradients, KL, and the Shape of Learning
Gradients, KL divergence and the shape of learning: what it actually takes to keep deep RL stable at scale.
Reasoning at Scale: What DeepSeek-R1, ProRL, and Prolonged RL Reveal
DeepSeek-R1, ProRL and prolonged RL reveal how far reasoning can be pushed when you simply do not stop training.
Darwin-Gödel to ShinkaEvolve: The Case for Open-Ended AI
From the Darwin-Gödel Machine to ShinkaEvolve — the case for open-ended systems that never converge.
Thinking Without Tokens: CTM and Inference-Time Compute Beyond CoT
Continuous Thought Machines and inference-time compute that happens beyond the token stream. Thought without words.
RL as Educator: Training Teachers, Not Just Students
Stop optimizing the student. Train the teacher. Reinforcement learning reframed as the design of curricula.
“A field's benchmarks are not neutral measurement tools. They encode what the field believes the problem is.”
This series follows a single thread: intelligence that must keep learning, in a world too large to ever fully model. From the measurement gap in continual RL to thinking without tokens, each essay pairs a careful read of the primary literature with a picture you can actually hold in your head.
Continual Intelligence · in the spirit of the Alberta Plan