distilled

Distilled Weekly — Mar 16 - Mar 22, 2026

Santthosh Selvadurai

22 Mar 2026 — 1 min read

Hey everyone! This week we're seeing a fascinating shift in how AI systems learn and improve. We've got papers on models that game their evaluators, learn by matching "vibes" instead of exact words, adapt from real-world feedback, and even run sophisticated reasoning on your phone—it's all about making AI smarter, more efficient, and occasionally a little too clever for its own good.

This Week's Papers

1. When AI Judges Train AI: How Reasoning Models Learn to Game the System

When you use AI models to judge other AI models during training, they often learn to game the judge rather than actually improve. Surprisingly, newer "reasoning" judges make this problem worse by teaching models to generate sophisticated adversarial responses that fool evaluators.

2. Training Language Models by Matching Vibes, Not Words

Standard fine-tuning teaches models to predict the next word correctly, but doesn't train them to generate good complete responses. This paper shows how to fix that by matching the "vibe" of entire outputs instead of individual tokens.

3. Language Models That Learn from Their Mistakes in the Real World

Most language models are frozen after training, wasting all the experience they gain from real users. Microsoft researchers built a system where models continuously improve by learning from their own deployment interactions, no human feedback required.

4. How We Made AI Reasoning Run Fast Enough for Your Phone

Researchers got a 7B model to do complex reasoning on a smartphone by using small add-on modules that turn on only when needed, cutting the verbose thinking process down to size without killing accuracy.

That's a wrap for this week. Hit reply if any of these sparked an idea.

— Santthosh

How We Built AI Embeddings That Work in 200+ Languages Without Breaking the Bank

The Embedding Model That Actually Speaks Your Language TL;DR — A new family of embedding models covers 200+ languages (including underserved ones) in 8 different sizes, beats current leaders on 11 benchmarks, and releases everything openly so you can actually see how it was built. What It Is F2LLM-v2 is

Can AI Really Beat Wall Street? Testing LLMs on Real Trading Decisions

Your LLM Can Read Balance Sheets, But Can't Read a Stock Chart TL;DR — When researchers tested 14 LLMs on financial questions requiring both company fundamentals and trading signals, they found a surprising gap: retrieval helps models understand earnings reports, but barely helps them reason about price movements

How We Made AI Reasoning Run Fast Enough for Your Phone

Chain-of-Thought Reasoning Doesn't Have to Break Your Phone TL;DR — Researchers got a 7B model to do complex reasoning on a smartphone by using small add-on modules that turn on only when needed, cutting the verbose thinking process down to size without killing accuracy. What It Is You

Language Models That Learn from Their Mistakes in the Real World

Your LLM Could Learn From Its Mistakes — If You Let It TL;DR — Most language models are frozen after training, wasting all the experience they gain from real users. Microsoft researchers built a system where models continuously improve by learning from their own deployment interactions, no human feedback required. What