Research · Jun 29, 2026

Researchers propose three-stage training paradigm to internalize future-aware planning in LLM agents

A new arXiv pre-print introduces a capability-first training pipeline—WM-AMT, FE-SFT, and FC-RL—to enable LLM agents to simulate future outcomes and estimate plan success, outperforming baselines on search and math tasks.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

A new arXiv pre-print proposes a three-stage training paradigm to enable LLM agents to internalize future-aware planning.

A new arXiv pre-print introduces a unified agentic training paradigm designed to internalize future-aware planning in large language model (LLM) agents. The authors argue that standard agents lack an internal world model to simulate future outcomes, making them fundamentally reactive in long-horizon tasks. To address this, they propose training a single autoregressive model to verbalize both a prospective state rollout and a plan-conditioned success estimate—a textual analogue of the Q-value.

The proposed approach includes a three-stage training pipeline: World Model Agentic Mid-Training (WM-AMT) to inject latent predictive capabilities into the policy; Format-Eliciting Supervised Fine-Tuning (FE-SFT) to structure this capability; and Foresight-Conditioned Reinforcement Learning (FC-RL) to refine the calibration and utility of generated simulations. The authors identify a 'format-capability gap,' noting that fine-tuning agents on look-ahead traces without this structured pipeline leads to superficial mimicry of foresight rather than genuine predictive grounding.

Evaluated on search and mathematical reasoning tasks, the approach is reported to consistently outperform other training baselines. The authors conclude that effective internal world modeling in LLM agents requires a capability-first training pipeline to achieve grounded and calibrated foresight.

Sources

01arXiv cs.AI — Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

Also on Research

Researchers propose three-stage training paradigm to internalize future-aware planning in LLM agents

Researchers propose AI-ModelNet, a framework to interconnect and coordinate heterogeneous AI models

Personality prompting in multi-agent LLM teams shows task-dependent effects on performance

Microsoft Research proposes generative causal testing to explain language-related brain activity