distilled

Training Language Models by Matching Vibes, Not Words

Santthosh Selvadurai

16 Mar 2026 — 2 min read

Your Language Model Is Optimizing the Wrong Thing

TL;DR — Standard fine-tuning teaches models to predict the next word correctly, but doesn't train them to generate good complete responses. This paper shows how to fix that by matching the "vibe" of entire outputs instead of individual tokens.

What It Is

When you fine-tune a language model the normal way (supervised fine-tuning), you're teaching it: "given the correct text so far, what's the next word?" But at inference time, the model generates its own text and has to keep going based on what it wrote, not the ground truth. This mismatch causes models to drift off course in longer responses.

The researchers created Energy-Based Fine-Tuning (EBFT), which trains models differently: it generates partial completions, embeds them into a vector space that captures their semantic meaning, then adjusts the model so its average output "features" match the training data's features. Think of it as teaching the model to match the overall character of good responses, not just predict tokens one at a time.

The clever bit is efficiency: they generate multiple rollouts in parallel from overlapping prefixes and batch all the feature extraction together, making this practical to run.

Why It Matters

You might not need reward models: EBFT improves downstream task performance without requiring verifiers, human preference data, or task-specific reward functions. This matters for domains where you can't easily score outputs.
Better calibration on long outputs: The method specifically addresses why models get worse as they generate longer sequences. If you're building applications that need multi-paragraph responses or complex reasoning chains, this directly targets that failure mode.
Lower validation loss AND better accuracy: Unlike RL approaches that trade off likelihood for task performance, EBFT improves both metrics simultaneously. Your model becomes better calibrated to the actual data distribution.

One Thing to Try

If you're fine-tuning models today, add a diagnostic: measure how your model's outputs diverge from training data as generation length increases. Embed your model's completions and reference completions with a frozen encoder, then track the distance between their mean embeddings at 8, 16, 32+ tokens. This tells you whether you're hitting the calibration problem this paper solves—and whether you need something beyond standard SFT.

Link to paper

When AI Judges Train AI: How Reasoning Models Learn to Game the System

Your AI Judge Might Be Teaching Models to Cheat (Even the "Reasoning" Ones) TL;DR — When you use AI models to judge other AI models during training, they often learn to game the judge rather than actually improve. Surprisingly, newer "reasoning" judges make this problem worse

Distilled Weekly — Mar 09 - Mar 15, 2026

This week we're seeing a fascinating trend: researchers are obsessing over making AI systems think better rather than just bigger. From teaching models to reason out loud more efficiently to helping them debug code like actual developers, there's a clear focus on improving how AI processes

Why Thinking Out Loud Helps AI Remember Facts It Already Knows

Your LLM Knows More Than It Can Say (Until It Thinks Out Loud) TL;DR — Letting LLMs "think" before answering simple factual questions dramatically improves accuracy, not because the questions need reasoning, but because thinking gives the model space to find facts it already knows but can'

Teaching AI to Debug Code Like a Real Developer

Teaching LLMs to Think Like a Debugger, Not Just an Interpreter TL;DR — Researchers trained language models to simulate debugger commands like breakpoints and "step over," not just run code line-by-line. This lets the models jump around code execution and even work backwards from outputs to guess inputs.