Granite R2: Open Multilingual Embeddings Just Got Better

Granite R2: Open Multilingual Embeddings Just Got Better

Great news for anyone working with multilingual data! We’re thrilled to announce the Granite Embedding Multilingual R2 release, bringing two powerful new embedding models designed to excel in diverse linguistic environments. These models address a common challenge: balancing extensive language support with efficient model size.

Whether you’re building retrieval-augmented generation (RAG) systems, implementing cross-lingual search, or facilitating code retrieval for international teams, these models offer exceptional performance without compromising speed or practicality. They are engineered to be enterprise-ready, offering superior quality for a wide range of applications.

Meet the New Multilingual Champions

The R2 release introduces two standout multilingual embedding models, both built on the innovative ModernBERT architecture and available under the Apache 2.0 license. They significantly improve upon their predecessors, offering unparalleled capabilities for various use cases.

  • Granite Embedding 97M Multilingual R2: This compact model, with just 97 million parameters, shatters expectations. It achieves a remarkable 60.3 score on MTEB Multilingual Retrieval, outperforming every other open-source multilingual embedder under 100M parameters. It’s an ideal choice when speed and resource efficiency are paramount.
  • Granite Embedding 311M Multilingual R2: Our full-size model boasts 311 million parameters and scores an impressive 65.2 on MTEB Multilingual Retrieval, securing its spot as #2 among open models under 500M parameters. This model also supports Matryoshka embeddings, offering flexible dimensionality for optimized performance and storage.

Both models support over 200 languages, with enhanced retrieval quality specifically tuned for 52 languages. They also introduce robust code retrieval across 9 programming languages, making them invaluable for development teams. A huge leap forward is their ability to handle context lengths up to 32,768 tokens—a 64x increase over the R1 models!

Integration is seamless: they work out-of-the-box with popular frameworks like sentence-transformers, transformers, LangChain, LlamaIndex, Haystack, and Milvus. Switching to these models often requires just a one-line code change, instantly extending multilingual support to your applications without additional dependencies.

Enterprise-Ready and Responsibly Built

These models are designed with enterprise deployment in mind. They are trained on a carefully curated blend of IBM-proprietary, publicly available, and synthetically generated datasets. IBM’s rigorous quality, deduplication, and governance processes are applied to all public web-derived data, minimizing risks for commercial use.

Crucially, we’ve intentionally avoided datasets with explicit non-commercial licensing restrictions, such as MS-MARCO, ensuring clear usage rights. The pretraining leverages GneissWeb, an IBM-curated dataset, along with other high-quality sources, all subjected to stringent IBM governance reviews for licensing, ownership, and personal data risks. This commitment ensures responsible AI development and deployment.

Under the Hood: R2 Innovations

The R2 generation represents a complete architectural overhaul from its R1 predecessors, which were based on XLM-RoBERTa encoders with a limited 512-token context. The new R2 models are built upon ModernBERT, an advanced encoder architecture that re-examines the original BERT design with contemporary transformer techniques.

This shift brings significant practical benefits: alternating attention lengths reduce computation for long sequences, rotary position embeddings enable the expansive 32K context window, and Flash Attention 2.0 support dramatically accelerates encoding on modern GPUs. Furthermore, the new multilingual tokenizers are a game-changer. The 311M model uses the Gemma 3 tokenizer, while the 97M model features a compact 180K-token vocabulary derived from GPT-OSS, optimizing multilingual coverage and reducing the embedding table footprint.

Exceptional Performance and Flexibility

Our benchmarks highlight the impressive gains. The 97M model scores 60.3 on MTEB Multilingual Retrieval, a +9.4 point gap over its closest sub-100M competitor, multilingual-e5-small. The full-size 311M model achieves 65.2 on the same benchmark, representing a +13.0 point gain over its R1 predecessor.

For speed and throughput, the 97M model encodes over 2,500 documents per second on an NVIDIA H100 GPU, offering comparable speed to multilingual-e5-small but with substantially higher retrieval quality. The 311M model, encoding around 1,800 docs/sec, surpasses jina-embeddings-v5-text-nano in retrieval quality at over 5.5x the speed.

The 311M model also features Matryoshka Representation Learning, allowing you to truncate embeddings from 768 dimensions down to 512, 384, 256, or even 128 with minimal quality degradation. For instance, reducing to 256 dimensions (a 3x storage reduction) only drops MTEB Multilingual Retrieval by 0.5 points, offering a powerful way to optimize storage and computation costs without sacrificing performance.

Source: Hugging Face Blog

Kristine Vior

Kristine Vior

With a deep passion for the intersection of technology and digital media, Kristine leads the editorial vision of HubNextera News. Her expertise lies in deciphering technical roadmaps and translating them into comprehensive news reports for a global audience. Every article is reviewed by Kristine to ensure it meets our standards for original perspective and technical depth.

More Posts - Website

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top