distilled

GLM-5: Teaching AI to Actually Build Software, Not Just Suggest Code

Santthosh Selvadurai

17 Feb 2026 — 2 min read

GLM-5: Teaching AI to Actually Build Software, Not Just Write Code Snippets

TL;DR — GLM-5 shifts from helping you write code to actually acting as a software engineer that can handle complex, multi-hour development tasks autonomously.

What It Is

The team at Zhipu AI built GLM-5 to solve a problem we've all felt: current AI coding assistants are great at writing functions, but terrible at being actual engineering partners. They coined the term "vibe coding" for what we have now—AI that writes code based on vibes but can't handle real software projects end-to-end.

GLM-5 uses three key innovations to bridge this gap. First, it implements DSA (a smarter attention mechanism that only focuses computing power where it matters), making it cheaper to run while handling 200,000 token contexts. Second, they built an entirely new reinforcement learning system that separates the "trying things" phase from the "learning from mistakes" phase, letting them train on complex, multi-step tasks much faster. Third, they trained the model in stages—reasoning first, then agent behavior, then general skills—while using distillation to prevent it from forgetting what it learned earlier.

The results are striking: GLM-5 scores 50 on the Intelligence Index (first open-weights model to hit this milestone), ranks #1 among open models on LMArena for both text and code, and can run a simulated vending machine business for a full year while managing resources effectively.

Why It Matters

Long-horizon tasks are now viable: If you've been frustrated by AI losing the thread after a few turns, GLM-5's ability to maintain coherence over hours (not minutes) opens up entirely new use cases—think automated refactoring, multi-file feature implementation, or ongoing codebase maintenance.
Open weights at frontier performance: GLM-5 matches Claude Opus 4.5 and GPT-5.2 on many benchmarks while being open-source, meaning you can actually deploy it on your own infrastructure without per-token API costs eating your budget.
The training recipe matters more than ever: Their staged RL approach (reasoning → agentic → general) with cross-stage distillation is a blueprint for anyone fine-tuning models—it shows you can teach specialized skills without destroying general capabilities.

One Thing to Try

If you're evaluating coding assistants for your team, test them on a real multi-hour task from your backlog—something requiring reading multiple files, making a plan, and executing across several PRs. Don't just measure "can it write this function" but "can it actually complete this feature ticket." GLM-5's performance on Vending-Bench 2 (managing a business for a full year) suggests the bar for "agentic" capability should be much higher than we've been setting it.

Link to paper

Teaching AI to Think Out Loud Without the Rambling

Teaching AI to Think Less and Say More TL;DR — Researchers found that AI reasoning models ramble too much, and simply asking them to "be concise" then training them to do it naturally cuts their thinking by half while making them more accurate. What It Is When you

Teaching AI to Search Like a Pro: How Reinforcement Learning Created a Next-Gen Enterprise Search Agent

Teaching AI Agents to Search Like Experts (Without Needing Human Labels) TL;DR — Databricks trained an AI agent that's better at searching through company documents and answering complex questions than GPT-5 or Claude, using fake data generated by other AI agents plus reinforcement learning. What It Is Most

Distilled Weekly — Mar 02 - Mar 08, 2026

This week we're diving deep into making AI agents actually useful — and that means teaching them to remember what they've learned, know their limits, and verify their own work. We've got fascinating papers on everything from giving agents memory systems that work like notebooks

Can AI Agents Create Harder Math Problems By Writing Code?

Teaching AI to Write Its Own Math Homework (And Make It Harder) TL;DR — Researchers built a system where AI coding agents take existing math problems and automatically generate harder versions that are still solvable, potentially solving the shortage of challenging problems needed to train advanced math AI. What It