Agents · Jun 30, 2026

Paper introduces RSEA agent that rewrites its own strategy, skills, and playbook without model updates

Recursive Self-Evolving Agent (RSEA) uses a three-layer natural-language state and a held-out selection gate to improve over a frozen LLM policy across four benchmarks without weight updates.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

RSEA is a new agent framework that evolves its own strategy, reusable skills, and procedural playbook as natural-language artifacts conditioned on a frozen LLM.
Across four benchmarks (ALFWorld, GAIA, τ-bench, WebShop) and six baselines, RSEA is the strongest single-pass method on ALFWorld (69.3% vs. 64.6% for ReAct; p=0.015) and achieves 79.4% with retry.
Unguarded context evolution (e.g., Dynamic Cheatsheet) can collapse on some tasks (WebShop: 0.14 vs. 0.43 for ReAct), while RSEA’s held-out gate prevents regression and falls back to the base agent when harmful.
The authors release the paper on arXiv under cs.AI (arXiv:2606.28374).

A new paper on arXiv introduces the Recursive Self-Evolving Agent (RSEA), a method that improves a frozen LLM policy by iteratively rewriting a compact three-layer natural-language state: an imperative strategy, reusable skills, and a procedural playbook. Unlike approaches that update model weights, RSEA evolves these artifacts from the agent’s own trajectories and commits changes only if they do not regress on a disjoint held-out split, using a strict keep-better gate.

The authors evaluate RSEA apples-to-apples against six baselines—ReAct, Reflexion, GEPA, AWM, ACE, and Dynamic Cheatsheet—on four diverse benchmarks: ALFWorld, GAIA, τ-bench, and WebShop. All methods share a single local backbone to ensure comparability.

Results show no artifact universally wins across tasks. On ALFWorld, RSEA achieves 69.3% in a single pass versus 64.6% for ReAct (McNemar test, p=0.015), and reaches 79.4% with retry, the best overall result reported. However, concrete-workflow induction (represented by AWM) performs best on strong-backbone tool-use tasks.

Unguarded context evolution can be high-variance and unsafe. The Dynamic Cheatsheet method, which curates context online without a held-out gate, is near-best on ALFWorld at 70.7%, but collapses on WebShop with a score of 0.14 compared to 0.43 for ReAct. By contrast, RSEA’s strict held-out selection makes recursive self-evolution monotone-safe: it never significantly underperforms the base agent on any benchmark and falls back to vanilla ReAct when evolved context would hurt.

The paper is available as arXiv:2606.28374 in the cs.AI category.

Sources

01arXiv cs.AI — Recursive Self-Evolving Agents via Held-Out Selection

Also on Agents

Paper introduces RSEA agent that rewrites its own strategy, skills, and playbook without model updates

OKX launches marketplace for AI agents to autonomously hire, pay, and build reputation

Latent Space roundup highlights agent harness engineering and open-weight model access

Stripe details agentic AI system for financial compliance built on AWS