distilled

Can AI Agents Create Harder Math Problems By Writing Code?

Santthosh Selvadurai

06 Mar 2026 — 2 min read

Teaching AI to Write Its Own Math Homework (And Make It Harder)

TL;DR — Researchers built a system where AI coding agents take existing math problems and automatically generate harder versions that are still solvable, potentially solving the shortage of challenging problems needed to train advanced math AI.

What It Is

We're running into a strange problem: our best AI models are getting so good at math that we're running out of hard problems to train and test them on. Creating IMO-level math problems requires serious expertise, and humans can't make them fast enough.

The Code2Math team had an idea: what if AI agents could use code to explore mathematical spaces and evolve existing problems into harder ones? They built a multi-agent system that takes a seed problem, writes code to explore variations computationally, then generates new problems based on what it discovers. For example, starting with a problem about finding a list of numbers that sum to 30 with specific properties, the agent explored thousands of configurations through code and created a harder version asking for the maximum list length given a sum of 323.

The key insight is that many hard math problems come from computational exploration—trying lots of examples, finding patterns, searching for edge cases. Code agents can do this exploration automatically and at scale.

Why It Matters

Training data bottleneck solved: If you're building or fine-tuning math reasoning models, you can now generate challenging problems programmatically instead of waiting for human experts to write them
Automatic difficulty scaling: You can take problems your model already solves and systematically generate harder variants, creating a curriculum that grows with your model's capabilities
Validation built-in: Unlike purely language-based problem generation (which often creates unsolvable or trivial problems), code execution provides automatic verification that evolved problems are actually solvable and structurally sound

One Thing to Try

If you're evaluating a math-capable model, take 10 problems it solves correctly and run them through a code agent with instructions to "find a harder variant by exploring edge cases computationally." Test whether your model still succeeds on the evolved versions—this gives you a quick difficulty calibration and might reveal capability gaps that standard benchmarks miss.

Link to paper

Teaching AI to Think Out Loud Without the Rambling

Teaching AI to Think Less and Say More TL;DR — Researchers found that AI reasoning models ramble too much, and simply asking them to "be concise" then training them to do it naturally cuts their thinking by half while making them more accurate. What It Is When you

Teaching AI to Search Like a Pro: How Reinforcement Learning Created a Next-Gen Enterprise Search Agent

Teaching AI Agents to Search Like Experts (Without Needing Human Labels) TL;DR — Databricks trained an AI agent that's better at searching through company documents and answering complex questions than GPT-5 or Claude, using fake data generated by other AI agents plus reinforcement learning. What It Is Most

Distilled Weekly — Mar 02 - Mar 08, 2026

This week we're diving deep into making AI agents actually useful — and that means teaching them to remember what they've learned, know their limits, and verify their own work. We've got fascinating papers on everything from giving agents memory systems that work like notebooks

Teaching AI Agents When to Say "No" Before They Break Something

AI Agents Need to Learn When to Say "No" TL;DR — When AI agents can use tools and take actions, teaching them to refuse unsafe requests is just as important as teaching them to complete tasks. A new training method cuts harmful behavior by 50% while keeping helpful