distilled

LLMs That Learn On-The-Fly: Making AI Models Update Themselves During Use

LLMs That Learn While They Think

TL;DR — Researchers found a way to let language models update their own weights during a conversation, making them better at handling long contexts without retraining the entire model from scratch.

What It Is

Most LLMs work like a book that's been printed: once training is done, the knowledge is frozen. If you feed them a 50-page document, they can only "remember" it by keeping it all in their attention window, which gets expensive fast. This paper introduces In-Place TTT, which lets models actually update a small part of themselves (specifically, the output layer of their MLP blocks) as they read new information. Think of it like taking notes in the margins as you read, rather than trying to memorize everything.

The clever bit is that this works on models that already exist. You can take something like Qwen3-4B, add this capability with minimal extra training, and suddenly it handles 128k-token contexts much better. The secret sauce is updating these "fast weights" in chunks and optimizing them specifically for predicting the next token, rather than using generic objectives that don't align with how language models actually work.

Why It Matters

You can extend context windows without architectural surgery — This drops into existing transformer models without requiring a complete retrain, which means you could potentially adapt already-deployed models to handle longer contexts more efficiently.
Better memory efficiency for long documents — Instead of keeping everything in expensive attention mechanisms, the model compresses what it learns into updated weights, which could mean lower inference costs for document analysis, long conversations, or code understanding tasks.
Opens the door to continual learning — If models can safely update themselves at inference time, we're one step closer to systems that genuinely learn from interactions rather than requiring full retraining cycles every time you want them to know something new.

One Thing to Try

If you're working with open-source models and struggling with long-context tasks, watch for implementations of In-Place TTT (the code is on GitHub at ByteDance-Seed/In-Place-TTT). Test whether chunk-wise weight updates give you better performance than context window extensions alone, especially for tasks where the model needs to "learn" domain-specific patterns from a long document before answering questions about it.

Link to paper

Distilled Weekly — Apr 13 - Apr 19, 2026

Welcome to this week's Distilled! We're diving into two papers that expose fundamental gaps in how AI systems actually work versus how we think they work. From vision-language models that can identify objects but stumble over simple spatial reasoning, to agents that reach for tools

AI Agents That Read Research Papers So You Don't Have To

Your AI Research Assistant Just Got a Team of Specialists TL;DR — Paper Circle uses multiple AI agents working together to find relevant research papers, build knowledge maps showing how ideas connect, and generate detailed reviews—all while showing you exactly how it reached each conclusion. What It Is Keeping

When AI Agents Need to Learn They Don't Always Need Tools

AI Agents Are Using Tools When They Should Just Think TL;DR — Multimodal AI agents call external tools way too often, even when they could answer questions just by looking at the image. A new training method teaches them when not to use tools, making them 50x more efficient without

Why Vision-Language AI Models Can "See" Images But Fail at Basic Reasoning (And How to Fix It)

Your Multimodal Model Can See the Math Problem But Forgets How to Solve It TL;DR — Multimodal AI models with Mixture-of-Experts architecture can read text from images perfectly but somehow fail at reasoning tasks they'd solve easily if you just typed the same text. The problem?