Distilled Weekly — Apr 13 - Apr 19, 2026
Welcome to this week's Distilled! We're diving into two papers that expose fundamental gaps in how AI systems actually work versus how we think they work. From vision-language models that can identify objects but stumble over simple spatial reasoning, to agents that reach for tools when basic learning would do the trick—this week is all about understanding where our AI still has some growing up to do.
This Week's Papers
1. Why Vision-Language AI Models Can "See" Images But Fail at Basic Reasoning (And How to Fix It)
Multimodal AI models with Mixture-of-Experts architecture can read text from images perfectly but somehow fail at reasoning tasks they'd solve easily if you just typed the same text. The problem? The image distracts the routing system from activating the right "expert" neurons.
2. When AI Agents Need to Learn They Don't Always Need Tools
Multimodal AI agents call external tools way too often, even when they could answer questions just by looking at the image. A new training method teaches them when not to use tools, making them 50x more efficient without losing accuracy.
That's a wrap for this week. Hit reply if any of these sparked an idea.
— Santthosh