Skip to content
Tools · May 14, 2026

IBM releases Granite Embedding Multilingual R2 models with 32K context support and Apache 2.0 license

IBM and Hugging Face release two new open-source multilingual embedding models: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval and a 311M full-size model scoring 65.2, both supporting 200+ languages with 32K-token context and code retrieval capabilities.

Trust76
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • IBM released granite-embedding-97m-multilingual-r2, a 97-parameter model that scores 60.3 on MTEB Multilingual Retrieval, outperforming all other open-source sub-100M multilingual embedders by 9.4 points versus the previous benchmark leader.
  • The full-size granite-embedding-311m-multilingual-r2 scores 65.2 on the same benchmark and ranks second among open models under 500M parameters with Matryoshka dimension support.
  • Both models support 200+ languages with enhanced training for 52 languages, handle 32,768-token context (64x increase from R1), cover nine programming languages, and are released under Apache 2.0 license.
  • Models are built on ModernBERT architecture with Flash Attention 2.0 support and ship with ONNX and OpenVINO weights for CPU-optimized inference.
  • Both models integrate as drop-in replacements in sentence-transformers, transformers, LangChain, LlamaIndex, Haystack, and Milvus with a single model name change.

IBM announced two new multilingual embedding models built on the ModernBERT architecture. The 97-parameter granite-embedding-97m-multilingual-r2 achieves 60.3 points on MTEB Multilingual Retrieval across 18 languages, setting a new benchmark for open-source models under 100M parameters and exceeding the previous leader by 9.4 points. The full-size 311M-parameter model scores 65.2 on the same benchmark, placing it second among open models under 500M parameters.

Both models cover 200+ languages with explicit training optimization for 52 languages including Amharic, Arabic, Bengali, Chinese, French, German, Hindi, Japanese, Korean, Russian, Spanish, Turkish, and Vietnamese. They extend support to nine programming languages—Python, Go, Java, JavaScript, PHP, Ruby, SQL, C, and C++—enabling code retrieval in multilingual development environments.

The R2 generation represents a ground-up redesign from R1. The shift from XLM-RoBERTa to ModernBERT introduces alternating attention lengths for efficient long-sequence processing, rotary position embeddings enabling 32K-token context windows, and Flash Attention 2.0 support. The new models employ custom multilingual tokenizers optimized for language coverage and parameter efficiency, replacing the previous 250K-token XLM-RoBERTa vocabulary.

Both models ship with ONNX and OpenVINO weights for CPU-optimized inference and integrate as one-line substitutes in sentence-transformers, LangChain, LlamaIndex, Haystack, and Milvus. The 311M model supports Matryoshka embeddings for dynamic dimensional reduction. Models are released under Apache 2.0 license and trained on IBM-curated datasets filtered for licensing compliance and commercial deployment safety.

Sources
  1. 01Hugging FaceGranite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context
Also on Tools

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.