Skip to content
Tools · Jun 25, 2026

Hugging Face and Treble Technologies launch open far-field ASR benchmark with live leaderboard

The FFASR Leaderboard is the first community-driven, standardized evaluation for automatic speech recognition in acoustically complex, real-world conditions, combining simulated and lab-measured data across 14 rooms.

Trust84
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • The FFASR Leaderboard is the first open, community-driven benchmark to evaluate ASR models under realistic far-field acoustic conditions, including reverberation, background noise, and varying microphone distances.

Hugging Face and Treble Technologies launched the Far-Field ASR (FFASR) Leaderboard, the first open, community-driven benchmark designed to evaluate automatic speech recognition (ASR) models under realistic far-field acoustic conditions. The leaderboard is live and invites community submissions, with results and analysis available on Hugging Face Spaces.

The benchmark evaluates models across nine conditions, with four primary ranking scores: near-field (dry) speech in an anechoic chamber; far-field high SNR (above 14 dB); far-field mid SNR (8 to 12 dB); and far-field low SNR (below 6 dB). Additional columns include Lab Measured and Lab Simulated tracks for sim-to-real validation, as well as moving-source splits in beta to reflect dynamic acoustic geometries.

Acoustic data is generated using Treble Technologies’ hybrid simulation engine, which combines wave-based solvers at low to mid frequencies with geometrical-acoustics modeling at higher frequencies. The benchmark includes 14 fully furnished rooms ranging from 20 to 470 m³, covering spaces such as bathrooms, living rooms, offices, classrooms, and restaurants, each with one target speaker and up to three noise sources.

For each submission, the leaderboard reports Word Error Rate (WER) and RTFx (audio seconds per inference second), evaluated on an NVIDIA L4 GPU under identical conditions. The Analysis tab presents a Pareto front to visualize the tradeoff between accuracy and latency, reflecting real deployment priorities.

Early results show a consistent pattern: across all submitted models, far-field WER at low SNR is several times higher than near-field WER on the same speech content, highlighting the gap between clean-speech benchmarks and real-world performance.

Sources
  1. 01Hugging FaceIntroducing the FFASR Leaderboard: Benchmarking ASR in the Real World
Also on Tools

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.