Microsoft Research proposes SkillOpt to train agent skills as external parameters
SkillOpt reframes agent skill editing as an optimization process, improving reliability without modifying model weights across 52 evaluation cells.
1 source · cross-referenced
- SkillOpt treats agent skill files as trainable parameters outside a frozen target model, turning skill writing into a controlled optimization process.
- Evaluated across six benchmarks, seven target models, and three execution modes, SkillOpt achieved the best or tied-best results in all 52 evaluation cells.
- Improvements included a +23.5-point absolute gain on a six-benchmark average for GPT-5.5 in direct chat, with the largest gains on procedural benchmarks like SpreadsheetBench (+38.9 points).
- The method keeps skills compact and auditable through bounded text edits, validation gating, rejected-edit feedback, and slow/meta updates.
AI agents often fail in production because their instructions—commonly called skills—are manually modified with no guarantee of improvement and a tendency to degrade over time. Microsoft Research’s SkillOpt reframes this problem by treating the agent skill file as a trainable parameter outside a frozen target model, turning one-shot prompting into a controlled optimization process.
SkillOpt organizes skill editing as a forward–backward–update cycle in text space. In the forward pass, a frozen target model executes tasks using the current skill; the rollout batch size controls how much evidence each update receives. In the backward pass, a separate optimizer model analyzes trajectories in reflection minibatches, distilling patterns to preserve from successes and correct from failures. The update step proposes small, bounded edits—add, delete, or replace—with a textual learning rate constraining the per-step edit budget.
Every candidate skill must pass a strict validation gate, being adopted only if it scores strictly higher than the current skill on a held-out validation split. Rejected edits are retained in a buffer as negative feedback for later optimizer calls within the same epoch. On a slower cadence, epoch-wise slow/meta updates consolidate longer-horizon lessons that single batches cannot reveal, jointly constraining skill optimization to prevent uncontrolled drift.
The method was evaluated across six benchmarks—SearchQA, SpreadsheetBench, OfficeQA, DocVQA, LiveMathematicianBench, and ALFWorld—seven target models ranging from frontier-scale GPT-5.5 to the small open-weight Qwen3.5-4B, and three execution modes: direct chat, Codex, and Claude Code. Across 52 evaluation cells, SkillOpt delivered the best or tied-for-best results in every case.
Performance improvements were substantial. For GPT-5.5 in direct chat, SkillOpt raised the six-benchmark average from 58.8 to 82.3, a +23.5-point absolute improvement and +5.4 points above an oracle that selects the single best competing method per cell. The largest gains appeared on procedural benchmarks: SpreadsheetBench rose from 41.8 to 80.7 (+38.9), OfficeQA from 33.1 to 72.1 (+39.0), and LiveMathematicianBench from 37.6 to 66.9 (+29.3).
SkillOpt also narrowed the gap between smaller and frontier models without changing weights or adding inference-time model calls. After optimization, GPT-5.4-mini’s six-benchmark average (64.3) exceeded the no-skill baseline of the larger GPT-5.4 (59.7), and GPT-5.4-nano (57.4) exceeded the baseline of a larger model variant.
- Jul 3, 2026 · Latent Space — swyx
Debate at AI Engineer World’s Fair highlights tensions between agentic loops and engineering discipline
Trust76 - Jul 3, 2026 · Latent Space — swyx
Adobe demonstrates agentic websites that assemble pages in real time based on user intent
Trust78 - Jul 3, 2026 · Latent Space — swyx
Vercel’s Andrew Qu says agents are a new kind of software and unveils internal framework ‘eve’
Trust79