distilled

Distilled Weekly — Apr 13 - Apr 19, 2026

Santthosh Selvadurai

13 Apr 2026 — 1 min read

Welcome to this week's Distilled! We're diving into two papers that expose fundamental gaps in how AI systems actually work versus how we think they work. From vision-language models that can identify objects but stumble over simple spatial reasoning, to agents that reach for tools when basic learning would do the trick—this week is all about understanding where our AI still has some growing up to do.

This Week's Papers

1. Why Vision-Language AI Models Can "See" Images But Fail at Basic Reasoning (And How to Fix It)

Multimodal AI models with Mixture-of-Experts architecture can read text from images perfectly but somehow fail at reasoning tasks they'd solve easily if you just typed the same text. The problem? The image distracts the routing system from activating the right "expert" neurons.

2. When AI Agents Need to Learn They Don't Always Need Tools

Multimodal AI agents call external tools way too often, even when they could answer questions just by looking at the image. A new training method teaches them when not to use tools, making them 50x more efficient without losing accuracy.

That's a wrap for this week. Hit reply if any of these sparked an idea.

— Santthosh

AI Agents That Read Research Papers So You Don't Have To

Your AI Research Assistant Just Got a Team of Specialists TL;DR — Paper Circle uses multiple AI agents working together to find relevant research papers, build knowledge maps showing how ideas connect, and generate detailed reviews—all while showing you exactly how it reached each conclusion. What It Is Keeping

LLMs That Learn On-The-Fly: Making AI Models Update Themselves During Use

LLMs That Learn While They Think TL;DR — Researchers found a way to let language models update their own weights during a conversation, making them better at handling long contexts without retraining the entire model from scratch. What It Is Most LLMs work like a book that's been

When AI Agents Need to Learn They Don't Always Need Tools

AI Agents Are Using Tools When They Should Just Think TL;DR — Multimodal AI agents call external tools way too often, even when they could answer questions just by looking at the image. A new training method teaches them when not to use tools, making them 50x more efficient without

Why Vision-Language AI Models Can "See" Images But Fail at Basic Reasoning (And How to Fix It)

Your Multimodal Model Can See the Math Problem But Forgets How to Solve It TL;DR — Multimodal AI models with Mixture-of-Experts architecture can read text from images perfectly but somehow fail at reasoning tasks they'd solve easily if you just typed the same text. The problem? The image