Hugging Face and Treble Technologies launch open far-field ASR benchmark with live leaderboard
The FFASR Leaderboard is the first community-driven, standardized evaluation for automatic speech recognition in acoustically complex, real-world conditions, combining simulated and lab-measured data across 14 rooms.
1 source · cross-referenced
- The FFASR Leaderboard is the first open, community-driven benchmark to evaluate ASR models under realistic far-field acoustic conditions, including reverberation, background noise, and varying microphone distances.
Hugging Face and Treble Technologies launched the Far-Field ASR (FFASR) Leaderboard, the first open, community-driven benchmark designed to evaluate automatic speech recognition (ASR) models under realistic far-field acoustic conditions. The leaderboard is live and invites community submissions, with results and analysis available on Hugging Face Spaces.
The benchmark evaluates models across nine conditions, with four primary ranking scores: near-field (dry) speech in an anechoic chamber; far-field high SNR (above 14 dB); far-field mid SNR (8 to 12 dB); and far-field low SNR (below 6 dB). Additional columns include Lab Measured and Lab Simulated tracks for sim-to-real validation, as well as moving-source splits in beta to reflect dynamic acoustic geometries.
Acoustic data is generated using Treble Technologies’ hybrid simulation engine, which combines wave-based solvers at low to mid frequencies with geometrical-acoustics modeling at higher frequencies. The benchmark includes 14 fully furnished rooms ranging from 20 to 470 m³, covering spaces such as bathrooms, living rooms, offices, classrooms, and restaurants, each with one target speaker and up to three noise sources.
For each submission, the leaderboard reports Word Error Rate (WER) and RTFx (audio seconds per inference second), evaluated on an NVIDIA L4 GPU under identical conditions. The Analysis tab presents a Pareto front to visualize the tradeoff between accuracy and latency, reflecting real deployment priorities.
Early results show a consistent pattern: across all submitted models, far-field WER at low SNR is several times higher than near-field WER on the same speech content, highlighting the gap between clean-speech benchmarks and real-world performance.
- Jun 24, 2026 · Hugging Face
NVIDIA NeMo AutoModel claims 3.4–3.7x higher training throughput and 29–32% less GPU memory for fine-tuning MoE models
Trust79 - Jun 24, 2026 · TechCrunch — AI
MoEngage acquires Aampe to deploy AI agents for customer-level marketing decisions
Trust74 - Jun 23, 2026 · Hugging Face
Hugging Face’s Transformers.js experiments with proposed Cross-Origin Storage API to reduce redundant model downloads
Trust79