Hugging Face releases six Ettin reranker models with distillation training recipe
A new family of open-source reranking models built on ModernBERT architectures, scaled from 17M to 1B parameters, with full training methodology and code publicly available.
1 source · cross-referenced
- Hugging Face released six Sentence Transformers CrossEncoder reranker models (17M to 1B parameters) built on Ettin ModernBERT encoders, designed for retrieve-then-rerank pipelines.
- Models were trained via distillation on mixedbread-ai reranker outputs over curated datasets, achieving state-of-the-art performance at their respective sizes on MTEB retrieval benchmarks.
- Complete training recipe, code, and an AI agent skill for fine-tuning rerankers on custom data are publicly available; models support 8K token context windows with 1.7x-8.3x speedup with flash attention.
- The retrieve-then-rerank pattern allows combining fast embedding models for candidate retrieval with accurate but expensive cross-encoders to reorder only top-K results.
Hugging Face published six new reranker models scaled from 17 million to 1 billion parameters, all built on the ModernBERT architecture from Johns Hopkins University's Ettin suite. The models are released as Sentence Transformers CrossEncoder implementations, integrable into production pipelines with minimal code.
Unlike embedding models, which encode query and document separately, rerankers perform joint encoding where query and document attend to each other across all transformer layers. This produces more accurate relevance scores but at higher computational cost, making them impractical to run over entire document corpora. The published models are intended for retrieve-then-rerank workflows: a fast embedding model retrieves top-K candidates, then the reranker re-orders those K with higher accuracy.
The six models were trained using knowledge distillation, with pointwise MSE loss from mixedbread-ai's mxbai-rerank-large-v2 as the teacher signal. Training data combined lightonai's embedding pre-training and fine-tuning datasets with additional reranking-specific curation. All models support up to 8,192 tokens of input context and achieve measured speedups of 1.7x to 8.3x when using flash attention and bfloat16 precision.
Hugging Face accompanied the release with a full training recipe, evaluation methodology, and dataset composition. The organization also added a new agent skill to Sentence Transformers v5.5.0 enabling developers to invoke AI coding assistants to fine-tune rerankers on custom data without manual engineering.
Performance is reported on MTEB English v2 retrieval tasks paired with various embedding models, with results detailed in the blog post for five embedder variants beyond the headline google/embedding-gemma-300m pairing.
- May 21, 2026 · TechCrunch
Spotify launches ElevenLabs-powered audiobook creation tool for independent authors
Trust54 - May 19, 2026 · Google AI — Blog
Google announces voice features, image editor, and personal AI agent for Workspace
Trust77 - May 18, 2026 · Hugging Face
Hugging Face releases fine-tuning guide for NVIDIA Cosmos video model using LoRA and DoRA
Trust74