Researchers propose three-stage training paradigm to internalize future-aware planning in LLM agents
A new arXiv pre-print introduces a capability-first training pipeline—WM-AMT, FE-SFT, and FC-RL—to enable LLM agents to simulate future outcomes and estimate plan success, outperforming baselines on search and math tasks.
1 source · cross-referenced
- A new arXiv pre-print proposes a three-stage training paradigm to enable LLM agents to internalize future-aware planning.
A new arXiv pre-print introduces a unified agentic training paradigm designed to internalize future-aware planning in large language model (LLM) agents. The authors argue that standard agents lack an internal world model to simulate future outcomes, making them fundamentally reactive in long-horizon tasks. To address this, they propose training a single autoregressive model to verbalize both a prospective state rollout and a plan-conditioned success estimate—a textual analogue of the Q-value.
The proposed approach includes a three-stage training pipeline: World Model Agentic Mid-Training (WM-AMT) to inject latent predictive capabilities into the policy; Format-Eliciting Supervised Fine-Tuning (FE-SFT) to structure this capability; and Foresight-Conditioned Reinforcement Learning (FC-RL) to refine the calibration and utility of generated simulations. The authors identify a 'format-capability gap,' noting that fine-tuning agents on look-ahead traces without this structured pipeline leads to superficial mimicry of foresight rather than genuine predictive grounding.
Evaluated on search and mathematical reasoning tasks, the approach is reported to consistently outperform other training baselines. The authors conclude that effective internal world modeling in LLM agents requires a capability-first training pipeline to achieve grounded and calibrated foresight.
- Jun 29, 2026 · arXiv cs.AI
Researchers propose AI-ModelNet, a framework to interconnect and coordinate heterogeneous AI models
Trust79 - Jun 29, 2026 · arXiv cs.AI
Personality prompting in multi-agent LLM teams shows task-dependent effects on performance
Trust79 - Jun 27, 2026 · Microsoft Research
Microsoft Research proposes generative causal testing to explain language-related brain activity
Trust79