Tools · May 20, 2026

Hugging Face releases six Ettin reranker models with distillation training recipe

A new family of open-source reranking models built on ModernBERT architectures, scaled from 17M to 1B parameters, with full training methodology and code publicly available.

Trust74

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Hugging Face released six Sentence Transformers CrossEncoder reranker models (17M to 1B parameters) built on Ettin ModernBERT encoders, designed for retrieve-then-rerank pipelines.
Models were trained via distillation on mixedbread-ai reranker outputs over curated datasets, achieving state-of-the-art performance at their respective sizes on MTEB retrieval benchmarks.
Complete training recipe, code, and an AI agent skill for fine-tuning rerankers on custom data are publicly available; models support 8K token context windows with 1.7x-8.3x speedup with flash attention.
The retrieve-then-rerank pattern allows combining fast embedding models for candidate retrieval with accurate but expensive cross-encoders to reorder only top-K results.

Hugging Face published six new reranker models scaled from 17 million to 1 billion parameters, all built on the ModernBERT architecture from Johns Hopkins University's Ettin suite. The models are released as Sentence Transformers CrossEncoder implementations, integrable into production pipelines with minimal code.

Unlike embedding models, which encode query and document separately, rerankers perform joint encoding where query and document attend to each other across all transformer layers. This produces more accurate relevance scores but at higher computational cost, making them impractical to run over entire document corpora. The published models are intended for retrieve-then-rerank workflows: a fast embedding model retrieves top-K candidates, then the reranker re-orders those K with higher accuracy.

The six models were trained using knowledge distillation, with pointwise MSE loss from mixedbread-ai's mxbai-rerank-large-v2 as the teacher signal. Training data combined lightonai's embedding pre-training and fine-tuning datasets with additional reranking-specific curation. All models support up to 8,192 tokens of input context and achieve measured speedups of 1.7x to 8.3x when using flash attention and bfloat16 precision.

Hugging Face accompanied the release with a full training recipe, evaluation methodology, and dataset composition. The organization also added a new agent skill to Sentence Transformers v5.5.0 enabling developers to invoke AI coding assistants to fine-tune rerankers on custom data without manual engineering.

Performance is reported on MTEB English v2 retrieval tasks paired with various embedding models, with results detailed in the blog post for five embedder variants beyond the headline google/embedding-gemma-300m pairing.

Sources

01Hugging Face — Introducing the Ettin Reranker Family

Also on Tools

Hugging Face releases six Ettin reranker models with distillation training recipe

Nonprofit Current AI launches open public AI infrastructure projects with $3.2M in grants

Smartsheet deploys remote Model Context Protocol server on AWS to connect AI agents to enterprise data

Interactive SQLite Query Explainer runs in-browser with annotated query plans