Research · Jul 1, 2026

Study finds external feedback drives agent improvement more than self-feedback or unguided refinement

Researchers introduce a controlled student-teacher protocol to isolate the effects of feedback on multi-turn agent performance across four benchmarks.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

A new arXiv preprint introduces a controlled student-teacher protocol to evaluate when natural-language feedback improves agent performance beyond repeated attempts alone.

Researchers propose a controlled student-teacher protocol to determine when natural-language feedback yields measurable improvement in multi-turn language agents beyond what can be achieved through repeated attempts alone. The study evaluates thirteen open-weight models in both student and teacher roles across four benchmarks: Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1.

The protocol compares three conditions: external feedback, self-feedback, and unguided self-refinement, while varying interaction history, task difficulty, and teacher access to privileged task information. Across settings, the authors find that multi-turn improvement is often not evidence of feedback use: self-generated feedback adds little beyond unguided self-refinement, whereas the strongest external teachers produce substantially larger feedback-specific gains.

The results indicate that useful feedback must provide guidance beyond generic retry, and that interactive gains are driven more by the student's ability to use feedback than by the teacher's identity. The authors argue that feedback-based agents should be evaluated against repeated-attempt baselines, and that the ability to act on feedback—not merely its availability—is a central bottleneck for interactive improvement.

The team releases a controlled student-teacher evaluation framework at a public URL to support reproducibility and further research.

Sources

01arXiv cs.AI — What Drives Interactive Improvement from Feedback?

Also on Research

Study finds external feedback drives agent improvement more than self-feedback or unguided refinement

Study proposes AI-driven method to discover reusable simulation models via natural language queries

Contrastive Reflection framework improves agentic IR prompt accuracy by 9 percentage points on HotpotQA

Hugging Face-affiliated team argues AI specialization is theoretically inevitable