Skip to content
Research · Jul 1, 2026

Study finds external feedback drives agent improvement more than self-feedback or unguided refinement

Researchers introduce a controlled student-teacher protocol to isolate the effects of feedback on multi-turn agent performance across four benchmarks.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • A new arXiv preprint introduces a controlled student-teacher protocol to evaluate when natural-language feedback improves agent performance beyond repeated attempts alone.

Researchers propose a controlled student-teacher protocol to determine when natural-language feedback yields measurable improvement in multi-turn language agents beyond what can be achieved through repeated attempts alone. The study evaluates thirteen open-weight models in both student and teacher roles across four benchmarks: Omni-MATH, Codeforces, BBEH Linguini, and ARC-AGI1.

The protocol compares three conditions: external feedback, self-feedback, and unguided self-refinement, while varying interaction history, task difficulty, and teacher access to privileged task information. Across settings, the authors find that multi-turn improvement is often not evidence of feedback use: self-generated feedback adds little beyond unguided self-refinement, whereas the strongest external teachers produce substantially larger feedback-specific gains.

The results indicate that useful feedback must provide guidance beyond generic retry, and that interactive gains are driven more by the student's ability to use feedback than by the teacher's identity. The authors argue that feedback-based agents should be evaluated against repeated-attempt baselines, and that the ability to act on feedback—not merely its availability—is a central bottleneck for interactive improvement.

The team releases a controlled student-teacher evaluation framework at a public URL to support reproducibility and further research.

Sources
  1. 01arXiv cs.AIWhat Drives Interactive Improvement from Feedback?
Also on Research

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.