New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks
COSPLAY co-evolves decision-making and skill discovery agents, showing 25% reward improvements on single-player benchmarks with an 8B model.
1 source · cross-referenced
- Researchers presented COSPLAY, a framework where an LLM decision agent retrieves skills from a learnable skill bank while a parallel agent extracts reusable skills from unlabeled rollouts.
- Experiments across six game environments showed the 8B-parameter base model achieved over 25.1% average reward improvement versus four frontier LLM baselines on single-player games.
- The framework addresses a core limitation of LLMs in long-horizon reasoning: the inability to discover, retain, and reuse structured skills across multiple episodes.
- COSPLAY remained competitive on multi-player social reasoning games, suggesting broad applicability beyond single-agent scenarios.
Researchers from multiple institutions have introduced COSPLAY, a co-evolutionary framework designed to improve LLM agent performance in long-horizon interactive environments. The system operates through dual mechanisms: an LLM decision agent that selects and chains skills, and a parallel skill-discovery pipeline that automatically extracts reusable action patterns from accumulated experience.
The core technical contribution addresses a known gap in LLM agent behavior—while these models can reason about individual steps effectively, they struggle to maintain coherent multi-step policies over extended episodes, particularly under delayed reward feedback and partial observability. COSPLAY solves this by maintaining an evolving skill bank that both agents learn from and contribute to during training.
The authors evaluated COSPLAY using an 8-parameter model across six game environments. On single-player benchmarks, the framework outperformed four frontier LLM baselines by an average of 25.1% in reward accumulation. Performance remained stable on multi-player social reasoning tasks, suggesting the approach generalizes beyond isolated decision-making scenarios.
The paper specifies that skills are extracted with formal 'contracts'—likely specifications of preconditions and effects—which allows for structured composition. This stands in contrast to unstructured prompt-based skill injection, potentially explaining the consistency gains observed across multiple environment types.
- May 22, 2026 · arXiv cs.AI
New Method Improves LLM Reasoning About Conflicting Beliefs in Complex Social Scenarios
Trust79 - May 20, 2026 · OpenAI — News
OpenAI model resolves 80-year-old discrete geometry conjecture
Trust67 - May 20, 2026 · arXiv cs.AI
Study evaluates how language models interpret personal health records to answer patient questions
Trust74