IBM releases Granite Embedding Multilingual R2 models with 32K context support and Apache 2.0 license
IBM and Hugging Face release two new open-source multilingual embedding models: a 97M-parameter compact model scoring 60.3 on MTEB Multilingual Retrieval and a 311M full-size model scoring 65.2, both supporting 200+ languages with 32K-token context and code retrieval capabilities.
1 source · cross-referenced
- IBM released granite-embedding-97m-multilingual-r2, a 97-parameter model that scores 60.3 on MTEB Multilingual Retrieval, outperforming all other open-source sub-100M multilingual embedders by 9.4 points versus the previous benchmark leader.
- The full-size granite-embedding-311m-multilingual-r2 scores 65.2 on the same benchmark and ranks second among open models under 500M parameters with Matryoshka dimension support.
- Both models support 200+ languages with enhanced training for 52 languages, handle 32,768-token context (64x increase from R1), cover nine programming languages, and are released under Apache 2.0 license.
- Models are built on ModernBERT architecture with Flash Attention 2.0 support and ship with ONNX and OpenVINO weights for CPU-optimized inference.
- Both models integrate as drop-in replacements in sentence-transformers, transformers, LangChain, LlamaIndex, Haystack, and Milvus with a single model name change.
IBM announced two new multilingual embedding models built on the ModernBERT architecture. The 97-parameter granite-embedding-97m-multilingual-r2 achieves 60.3 points on MTEB Multilingual Retrieval across 18 languages, setting a new benchmark for open-source models under 100M parameters and exceeding the previous leader by 9.4 points. The full-size 311M-parameter model scores 65.2 on the same benchmark, placing it second among open models under 500M parameters.
Both models cover 200+ languages with explicit training optimization for 52 languages including Amharic, Arabic, Bengali, Chinese, French, German, Hindi, Japanese, Korean, Russian, Spanish, Turkish, and Vietnamese. They extend support to nine programming languages—Python, Go, Java, JavaScript, PHP, Ruby, SQL, C, and C++—enabling code retrieval in multilingual development environments.
The R2 generation represents a ground-up redesign from R1. The shift from XLM-RoBERTa to ModernBERT introduces alternating attention lengths for efficient long-sequence processing, rotary position embeddings enabling 32K-token context windows, and Flash Attention 2.0 support. The new models employ custom multilingual tokenizers optimized for language coverage and parameter efficiency, replacing the previous 250K-token XLM-RoBERTa vocabulary.
Both models ship with ONNX and OpenVINO weights for CPU-optimized inference and integrate as one-line substitutes in sentence-transformers, LangChain, LlamaIndex, Haystack, and Milvus. The 311M model supports Matryoshka embeddings for dynamic dimensional reduction. Models are released under Apache 2.0 license and trained on IBM-curated datasets filtered for licensing compliance and commercial deployment safety.
- May 21, 2026 · TechCrunch
Spotify launches ElevenLabs-powered audiobook creation tool for independent authors
Trust54 - May 20, 2026 · Hugging Face
Hugging Face releases six Ettin reranker models with distillation training recipe
Trust74 - May 19, 2026 · Google AI — Blog
Google announces voice features, image editor, and personal AI agent for Workspace
Trust77