distilled

How Meta Built a Chatbot That Gets Better Every Time You Talk to It

Santthosh Selvadurai

04 Mar 2026 — 2 min read

Meta Trained a Chatbot 15 Times on Millions of Real Users—Here's What Worked

TL;DR — Meta improved their AI chatbot by repeatedly testing it on real Instagram, WhatsApp, and Messenger users, then training new versions based on what kept people engaged. After 15 iterations, they got 19% more conversation depth and nearly doubled how well the AI followed character instructions.

What It Is

Meta built a "flywheel" process for making chatbots better at social conversation—think Character.AI, not ChatGPT. They started with LLaMA 3.1 and kept improving it by watching how millions of actual users chatted with AI characters across their apps. Each week, they'd run A/B tests, figure out what made conversations more engaging, train reward models (AI judges that predict what users will like), then use those to train the next version. The tricky part? Engagement isn't something you can measure directly during training—you only know if you succeeded after real people use it. So they treated it like climbing a mountain in fog: sample the terrain around you, estimate which way is up, take a careful step, then check if you actually climbed higher. They did this 15 times over nine months, and 7 out of 8 production deployments beat their baseline.

Why It Matters

Real engagement beats synthetic benchmarks: They stopped optimizing for what sounds smart on paper and started optimizing for what keeps people talking. Instruction-following jumped from 59% to 85% because they measured it with actual user behavior, not eval datasets.
Small, continuous updates work better than big leaps: Taking measured steps with constant reality checks (weekly A/B tests) prevented the overfitting that kills most RL training. You don't need a perfect reward model—just one good enough to point uphill.
Social AI needs different metrics than assistant AI: Breadth (how many users engage) and depth (how long conversations last) matter more than correctness. The techniques that work for coding assistants don't transfer directly to conversational experiences.

One Thing to Try

If you're tuning an LLM for user engagement, set up a lightweight preference collection system where you show real users two model outputs and track which one they actually engage with (not just which they say they prefer). Use those binary preferences to train a reward model, even if it's noisy—Meta's results suggest a rough compass beats wandering blind.

Link to paper

Teaching AI to Think Out Loud Without the Rambling

Teaching AI to Think Less and Say More TL;DR — Researchers found that AI reasoning models ramble too much, and simply asking them to "be concise" then training them to do it naturally cuts their thinking by half while making them more accurate. What It Is When you

Teaching AI to Search Like a Pro: How Reinforcement Learning Created a Next-Gen Enterprise Search Agent

Teaching AI Agents to Search Like Experts (Without Needing Human Labels) TL;DR — Databricks trained an AI agent that's better at searching through company documents and answering complex questions than GPT-5 or Claude, using fake data generated by other AI agents plus reinforcement learning. What It Is Most

Distilled Weekly — Mar 02 - Mar 08, 2026

This week we're diving deep into making AI agents actually useful — and that means teaching them to remember what they've learned, know their limits, and verify their own work. We've got fascinating papers on everything from giving agents memory systems that work like notebooks

Can AI Agents Create Harder Math Problems By Writing Code?

Teaching AI to Write Its Own Math Homework (And Make It Harder) TL;DR — Researchers built a system where AI coding agents take existing math problems and automatically generate harder versions that are still solvable, potentially solving the shortage of challenging problems needed to train advanced math AI. What It