distilled

When AI Agents Need to Learn They Don't Always Need Tools

AI Agents Are Using Tools When They Should Just Think

TL;DR — Multimodal AI agents call external tools way too often, even when they could answer questions just by looking at the image. A new training method teaches them when not to use tools, making them 50x more efficient without losing accuracy.

What It Is

Researchers at Alibaba found that current AI agents have a serious judgment problem: they can't tell when to use their own knowledge versus when to call an external tool. Imagine asking "what color is this apple?" and the AI calling a color detection API instead of just looking at the image. Their data showed agents using tools 80-98% of the time, even on simple questions.

The team built a new training approach called HDPO that teaches agents two separate lessons: first, get the right answer (accuracy), then learn to get it efficiently (using fewer tools). Previous methods tried to balance these with a single reward score, which failed—penalize tool use too much and the agent becomes useless on hard questions; penalize too little and the variance drowns out the signal entirely. By splitting these into separate training channels, their model (called Metis) dropped tool usage from 98% to 2% while actually improving accuracy.

Why It Matters

Latency kills user experience: Every tool call adds network round-trips. An agent that uses 50x fewer tools responds 50x faster, turning unusable experiences into snappy ones.
Cost scales with API calls: If you're paying per tool invocation (OCR, calculators, web search), reducing usage from 98% to 2% literally cuts your inference costs by 98%.
Noise compounds errors: Each unnecessary tool call introduces potential errors—OCR misreads, APIs timeout, parsers fail. Fewer calls means fewer failure points in your reasoning chain.

One Thing to Try

If you're fine-tuning an agent with RL, stop mixing task accuracy and efficiency into one reward score. Instead, run two separate optimization passes: first optimize purely for correctness across all attempts, then add an efficiency penalty that only applies to trajectories that already got the right answer. This prevents the "conservative agent" problem where penalizing tool use makes your model worse at hard tasks.

Link to paper

Distilled Weekly — Apr 13 - Apr 19, 2026

Welcome to this week's Distilled! We're diving into two papers that expose fundamental gaps in how AI systems actually work versus how we think they work. From vision-language models that can identify objects but stumble over simple spatial reasoning, to agents that reach for tools

AI Agents That Read Research Papers So You Don't Have To

Your AI Research Assistant Just Got a Team of Specialists TL;DR — Paper Circle uses multiple AI agents working together to find relevant research papers, build knowledge maps showing how ideas connect, and generate detailed reviews—all while showing you exactly how it reached each conclusion. What It Is Keeping

LLMs That Learn On-The-Fly: Making AI Models Update Themselves During Use

LLMs That Learn While They Think TL;DR — Researchers found a way to let language models update their own weights during a conversation, making them better at handling long contexts without retraining the entire model from scratch. What It Is Most LLMs work like a book that's been

Why Vision-Language AI Models Can "See" Images But Fail at Basic Reasoning (And How to Fix It)

Your Multimodal Model Can See the Math Problem But Forgets How to Solve It TL;DR — Multimodal AI models with Mixture-of-Experts architecture can read text from images perfectly but somehow fail at reasoning tasks they'd solve easily if you just typed the same text. The problem?