Research · Apr 24, 2026

New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks

COSPLAY co-evolves decision-making and skill discovery agents, showing 25% reward improvements on single-player benchmarks with an 8B model.

Trust69

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Researchers presented COSPLAY, a framework where an LLM decision agent retrieves skills from a learnable skill bank while a parallel agent extracts reusable skills from unlabeled rollouts.
Experiments across six game environments showed the 8B-parameter base model achieved over 25.1% average reward improvement versus four frontier LLM baselines on single-player games.
The framework addresses a core limitation of LLMs in long-horizon reasoning: the inability to discover, retain, and reuse structured skills across multiple episodes.
COSPLAY remained competitive on multi-player social reasoning games, suggesting broad applicability beyond single-agent scenarios.

Researchers from multiple institutions have introduced COSPLAY, a co-evolutionary framework designed to improve LLM agent performance in long-horizon interactive environments. The system operates through dual mechanisms: an LLM decision agent that selects and chains skills, and a parallel skill-discovery pipeline that automatically extracts reusable action patterns from accumulated experience.

The core technical contribution addresses a known gap in LLM agent behavior—while these models can reason about individual steps effectively, they struggle to maintain coherent multi-step policies over extended episodes, particularly under delayed reward feedback and partial observability. COSPLAY solves this by maintaining an evolving skill bank that both agents learn from and contribute to during training.

The authors evaluated COSPLAY using an 8-parameter model across six game environments. On single-player benchmarks, the framework outperformed four frontier LLM baselines by an average of 25.1% in reward accumulation. Performance remained stable on multi-player social reasoning tasks, suggesting the approach generalizes beyond isolated decision-making scenarios.

The paper specifies that skills are extracted with formal 'contracts'—likely specifications of preconditions and effects—which allows for structured composition. This stands in contrast to unstructured prompt-based skill injection, potentially explaining the consistency gains observed across multiple environment types.

Sources

01arXiv cs.AI — Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

Also on Research

New framework enables LLMs to discover and reuse skills for long-horizon game-playing tasks

Researchers release GAND, a benchmark to study gender bias in machine translation through gender-ambiguous natural data

Official conference reviewer guidelines outperform LLM-generated reviewer-imitating guidelines in automated peer review study

Researchers release MioFFAn, an open-source framework for annotating and formalizing scientific formulas with LLM-assisted workflows