Paper introduces RSEA agent that rewrites its own strategy, skills, and playbook without model updates
Recursive Self-Evolving Agent (RSEA) uses a three-layer natural-language state and a held-out selection gate to improve over a frozen LLM policy across four benchmarks without weight updates.
1 source · cross-referenced
- RSEA is a new agent framework that evolves its own strategy, reusable skills, and procedural playbook as natural-language artifacts conditioned on a frozen LLM.
- Across four benchmarks (ALFWorld, GAIA, τ-bench, WebShop) and six baselines, RSEA is the strongest single-pass method on ALFWorld (69.3% vs. 64.6% for ReAct; p=0.015) and achieves 79.4% with retry.
- Unguarded context evolution (e.g., Dynamic Cheatsheet) can collapse on some tasks (WebShop: 0.14 vs. 0.43 for ReAct), while RSEA’s held-out gate prevents regression and falls back to the base agent when harmful.
- The authors release the paper on arXiv under cs.AI (arXiv:2606.28374).
A new paper on arXiv introduces the Recursive Self-Evolving Agent (RSEA), a method that improves a frozen LLM policy by iteratively rewriting a compact three-layer natural-language state: an imperative strategy, reusable skills, and a procedural playbook. Unlike approaches that update model weights, RSEA evolves these artifacts from the agent’s own trajectories and commits changes only if they do not regress on a disjoint held-out split, using a strict keep-better gate.
The authors evaluate RSEA apples-to-apples against six baselines—ReAct, Reflexion, GEPA, AWM, ACE, and Dynamic Cheatsheet—on four diverse benchmarks: ALFWorld, GAIA, τ-bench, and WebShop. All methods share a single local backbone to ensure comparability.
Results show no artifact universally wins across tasks. On ALFWorld, RSEA achieves 69.3% in a single pass versus 64.6% for ReAct (McNemar test, p=0.015), and reaches 79.4% with retry, the best overall result reported. However, concrete-workflow induction (represented by AWM) performs best on strong-backbone tool-use tasks.
Unguarded context evolution can be high-variance and unsafe. The Dynamic Cheatsheet method, which curates context online without a held-out gate, is near-best on ALFWorld at 70.7%, but collapses on WebShop with a score of 0.14 compared to 0.43 for ReAct. By contrast, RSEA’s strict held-out selection makes recursive self-evolution monotone-safe: it never significantly underperforms the base agent on any benchmark and falls back to vanilla ReAct when evolved context would hurt.
The paper is available as arXiv:2606.28374 in the cs.AI category.
- Jun 30, 2026 · TechCrunch — AI
OKX launches marketplace for AI agents to autonomously hire, pay, and build reputation
Trust72 - Jun 30, 2026 · Latent Space — swyx
Latent Space roundup highlights agent harness engineering and open-weight model access
Trust72 - Jun 27, 2026 · AWS — Machine Learning Blog
Stripe details agentic AI system for financial compliance built on AWS
Trust79