Open Agent Leaderboard measures full systems, not just models, across diverse real-world tasks
Hugging Face and IBM Research launched the Open Agent Leaderboard, an evaluation framework that benchmarks full agent systems (model plus architecture) rather than isolated models across six real-world task categories.