distilled

How We Built AI Embeddings That Work in 200+ Languages Without Breaking the Bank

Santthosh Selvadurai

21 Mar 2026 — 2 min read

The Embedding Model That Actually Speaks Your Language

TL;DR — A new family of embedding models covers 200+ languages (including underserved ones) in 8 different sizes, beats current leaders on 11 benchmarks, and releases everything openly so you can actually see how it was built.

What It Is

F2LLM-v2 is a collection of text embedding models—the AI systems that convert sentences into numbers so computers can understand semantic similarity. What makes these different is radical inclusivity: they trained on 60 million examples covering over 200 natural languages, from English and Chinese down to languages like Khmer and Lao that usually get ignored. The models come in 8 sizes (80 million to 14 billion parameters), so you can pick what fits your hardware budget. They used clever techniques like matryoshka learning (training embeddings that work at multiple dimensions) and knowledge distillation (teaching small models to mimic large ones) to keep smaller versions punchy. The 14B model now ranks first on 11 different MTEB language benchmarks, including Persian, Vietnamese, and Indic languages.

Why It Matters

You can finally build multilingual RAG systems that don't suck: Most embedding models claim to be "multilingual" but really just work well in English and maybe Chinese. If you're building search or retrieval for global markets, you've probably noticed your Polish or Vietnamese results are garbage—F2LLM-v2 actually performs in mid-resource languages.
Pick your performance/cost tradeoff: The same team trained everything from 80M to 14B parameters with the same data and techniques, so you're comparing apples to apples when choosing a size. The 600M model beats previous state-of-the-art models twice its size.
You can actually see the recipe: Unlike competitors like Qwen3-Embedding or Gemini-Embedding, they released the training data, code, and intermediate checkpoints. If you need to understand why something works (or doesn't) in production, you can actually dig in.

One Thing to Try

If you're running embeddings on a budget or at the edge, swap your current model for F2LLM-v2-1.7B and benchmark it against whatever you're using now. It's small enough to run without a massive GPU bill but outperforms most larger models—especially if your users speak anything other than English. All models are on HuggingFace under codefuse-ai/f2llm.

Link to paper

Distilled Weekly — Mar 16 - Mar 22, 2026

Hey everyone! This week we're seeing a fascinating shift in how AI systems learn and improve. We've got papers on models that game their evaluators, learn by matching "vibes" instead of exact words, adapt from real-world feedback, and even run sophisticated reasoning on your

Can AI Really Beat Wall Street? Testing LLMs on Real Trading Decisions

Your LLM Can Read Balance Sheets, But Can't Read a Stock Chart TL;DR — When researchers tested 14 LLMs on financial questions requiring both company fundamentals and trading signals, they found a surprising gap: retrieval helps models understand earnings reports, but barely helps them reason about price movements

How We Made AI Reasoning Run Fast Enough for Your Phone

Chain-of-Thought Reasoning Doesn't Have to Break Your Phone TL;DR — Researchers got a 7B model to do complex reasoning on a smartphone by using small add-on modules that turn on only when needed, cutting the verbose thinking process down to size without killing accuracy. What It Is You

Language Models That Learn from Their Mistakes in the Real World

Your LLM Could Learn From Its Mistakes — If You Let It TL;DR — Most language models are frozen after training, wasting all the experience they gain from real users. Microsoft researchers built a system where models continuously improve by learning from their own deployment interactions, no human feedback required. What