distilled

Teaching Robots to Learn from Their Mistakes in Real-Time

Santthosh Selvadurai

27 Feb 2026 — 2 min read

Robots That Learn From Their Mistakes (Instead of Repeating Them)

TL;DR — A new approach teaches robots to both think through actions before trying them and update their decision-making after failures, turning deployment into a learning experience rather than endless trial-and-error.

What It Is

When you give a robot an LLM brain today, it makes the same mistakes over and over. Ask it to "collect toys and put them in boxes," and it might stuff a teddy bear in the only box big enough for a toy car—then make the exact same error tomorrow.

This research introduces two types of reflection that work together. Before acting, the robot generates multiple possible actions and internally scores them ("the orange box is too small for the car"). After acting, it evaluates what actually happened and updates both its scoring system and its action policy through test-time training (learning during deployment, not just during initial training). The key innovation is "retrospective reflection"—looking back at earlier decisions with hindsight to figure out which early choices led to later failures, solving the credit assignment problem that plagues long task sequences.

Why It Matters

Your robot demos might actually improve over time — Instead of scripting recovery behaviors for every failure mode, the system learns from mistakes during deployment, potentially reducing the engineering overhead of handling edge cases.
Test-time compute becomes doubly useful — You're not just generating more candidate actions (like o1-style inference scaling), you're also using execution outcomes to improve the model's judgment about which actions will work, creating a feedback loop.
Long-horizon tasks become more tractable — The retrospective reflection mechanism addresses a core problem in robotics: figuring out that the action you took 10 steps ago is why you're stuck now, not the action you just took.

One Thing to Try

If you're building LLM agents (even non-robotic ones), implement a simple version of reflection-in-action: generate 3-5 candidate next actions with high temperature, have the LLM score each with a brief self-critique, then execute the highest-scoring option. This costs more tokens but can catch obvious mistakes before they happen.

Link to paper

Teaching AI to Think Out Loud Without the Rambling

Teaching AI to Think Less and Say More TL;DR — Researchers found that AI reasoning models ramble too much, and simply asking them to "be concise" then training them to do it naturally cuts their thinking by half while making them more accurate. What It Is When you

Teaching AI to Search Like a Pro: How Reinforcement Learning Created a Next-Gen Enterprise Search Agent

Teaching AI Agents to Search Like Experts (Without Needing Human Labels) TL;DR — Databricks trained an AI agent that's better at searching through company documents and answering complex questions than GPT-5 or Claude, using fake data generated by other AI agents plus reinforcement learning. What It Is Most

Distilled Weekly — Mar 02 - Mar 08, 2026

This week we're diving deep into making AI agents actually useful — and that means teaching them to remember what they've learned, know their limits, and verify their own work. We've got fascinating papers on everything from giving agents memory systems that work like notebooks

Can AI Agents Create Harder Math Problems By Writing Code?

Teaching AI to Write Its Own Math Homework (And Make It Harder) TL;DR — Researchers built a system where AI coding agents take existing math problems and automatically generate harder versions that are still solvable, potentially solving the shortage of challenging problems needed to train advanced math AI. What It