Study finds external feedback drives agent improvement more than self-feedback or unguided refinement
Researchers introduce a controlled student-teacher protocol to isolate the effects of feedback on multi-turn agent performance across four benchmarks.
1 source · cross-referenced
- A new arXiv preprint introduces a controlled student-teacher protocol to evaluate when natural-language feedback improves agent performance beyond repeated attempts alone.
Researchers propose a controlled student-teacher protocol to determine when natural-language feedback yields measurable improvement in multi-turn language agents beyond what can be achieved through repeated attempts alone. The study evaluates thirteen open-weight models in both student and teacher roles across four benchmarks: Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1.
The protocol compares three conditions: external feedback, self-feedback, and unguided self-refinement, while varying interaction history, task difficulty, and teacher access to privileged task information. Across settings, the authors find that multi-turn improvement is often not evidence of feedback use: self-generated feedback adds little beyond unguided self-refinement, whereas the strongest external teachers produce substantially larger feedback-specific gains.
The results indicate that useful feedback must provide guidance beyond generic retry, and that interactive gains are driven more by the student's ability to use feedback than by the teacher's identity. The authors argue that feedback-based agents should be evaluated against repeated-attempt baselines, and that the ability to act on feedback—not merely its availability—is a central bottleneck for interactive improvement.
The team releases a controlled student-teacher evaluation framework at a public URL to support reproducibility and further research.
- Jul 1, 2026 · arXiv cs.AI
Study proposes AI-driven method to discover reusable simulation models via natural language queries
Trust79 - Jul 1, 2026 · arXiv cs.AI
Contrastive Reflection framework improves agentic IR prompt accuracy by 9 percentage points on HotpotQA
Trust79 - Jun 30, 2026 · Hugging Face
Hugging Face-affiliated team argues AI specialization is theoretically inevitable
Trust71