How We Built AI Embeddings That Work in 200+ Languages Without Breaking the Bank

How We Built AI Embeddings That Work in 200+ Languages Without Breaking the Bank

The Embedding Model That Actually Speaks Your Language

TL;DR — A new family of embedding models covers 200+ languages (including underserved ones) in 8 different sizes, beats current leaders on 11 benchmarks, and releases everything openly so you can actually see how it was built.

What It Is

F2LLM-v2 is a collection of text embedding models—the AI systems that convert sentences into numbers so computers can understand semantic similarity. What makes these different is radical inclusivity: they trained on 60 million examples covering over 200 natural languages, from English and Chinese down to languages like Khmer and Lao that usually get ignored. The models come in 8 sizes (80 million to 14 billion parameters), so you can pick what fits your hardware budget. They used clever techniques like matryoshka learning (training embeddings that work at multiple dimensions) and knowledge distillation (teaching small models to mimic large ones) to keep smaller versions punchy. The 14B model now ranks first on 11 different MTEB language benchmarks, including Persian, Vietnamese, and Indic languages.

Why It Matters

  • You can finally build multilingual RAG systems that don't suck: Most embedding models claim to be "multilingual" but really just work well in English and maybe Chinese. If you're building search or retrieval for global markets, you've probably noticed your Polish or Vietnamese results are garbage—F2LLM-v2 actually performs in mid-resource languages.
  • Pick your performance/cost tradeoff: The same team trained everything from 80M to 14B parameters with the same data and techniques, so you're comparing apples to apples when choosing a size. The 600M model beats previous state-of-the-art models twice its size.
  • You can actually see the recipe: Unlike competitors like Qwen3-Embedding or Gemini-Embedding, they released the training data, code, and intermediate checkpoints. If you need to understand why something works (or doesn't) in production, you can actually dig in.

One Thing to Try

If you're running embeddings on a budget or at the edge, swap your current model for F2LLM-v2-1.7B and benchmark it against whatever you're using now. It's small enough to run without a massive GPU bill but outperforms most larger models—especially if your users speak anything other than English. All models are on HuggingFace under codefuse-ai/f2llm.

Link to paper

Read more