Skip to content
Agents · Jul 2, 2026

Microsoft Research proposes SkillOpt to train agent skills as external parameters

SkillOpt reframes agent skill editing as an optimization process, improving reliability without modifying model weights across 52 evaluation cells.

Trust79
HypeLow hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • SkillOpt treats agent skill files as trainable parameters outside a frozen target model, turning skill writing into a controlled optimization process.
  • Evaluated across six benchmarks, seven target models, and three execution modes, SkillOpt achieved the best or tied-best results in all 52 evaluation cells.
  • Improvements included a +23.5-point absolute gain on a six-benchmark average for GPT-5.5 in direct chat, with the largest gains on procedural benchmarks like SpreadsheetBench (+38.9 points).
  • The method keeps skills compact and auditable through bounded text edits, validation gating, rejected-edit feedback, and slow/meta updates.

AI agents often fail in production because their instructions—commonly called skills—are manually modified with no guarantee of improvement and a tendency to degrade over time. Microsoft Research’s SkillOpt reframes this problem by treating the agent skill file as a trainable parameter outside a frozen target model, turning one-shot prompting into a controlled optimization process.

SkillOpt organizes skill editing as a forward–backward–update cycle in text space. In the forward pass, a frozen target model executes tasks using the current skill; the rollout batch size controls how much evidence each update receives. In the backward pass, a separate optimizer model analyzes trajectories in reflection minibatches, distilling patterns to preserve from successes and correct from failures. The update step proposes small, bounded edits—add, delete, or replace—with a textual learning rate constraining the per-step edit budget.

Every candidate skill must pass a strict validation gate, being adopted only if it scores strictly higher than the current skill on a held-out validation split. Rejected edits are retained in a buffer as negative feedback for later optimizer calls within the same epoch. On a slower cadence, epoch-wise slow/meta updates consolidate longer-horizon lessons that single batches cannot reveal, jointly constraining skill optimization to prevent uncontrolled drift.

The method was evaluated across six benchmarks—SearchQA, SpreadsheetBench, OfficeQA, DocVQA, LiveMathematicianBench, and ALFWorld—seven target models ranging from frontier-scale GPT-5.5 to the small open-weight Qwen3.5-4B, and three execution modes: direct chat, Codex, and Claude Code. Across 52 evaluation cells, SkillOpt delivered the best or tied-for-best results in every case.

Performance improvements were substantial. For GPT-5.5 in direct chat, SkillOpt raised the six-benchmark average from 58.8 to 82.3, a +23.5-point absolute improvement and +5.4 points above an oracle that selects the single best competing method per cell. The largest gains appeared on procedural benchmarks: SpreadsheetBench rose from 41.8 to 80.7 (+38.9), OfficeQA from 33.1 to 72.1 (+39.0), and LiveMathematicianBench from 37.6 to 66.9 (+29.3).

SkillOpt also narrowed the gap between smaller and frontier models without changing weights or adding inference-time model calls. After optimization, GPT-5.4-mini’s six-benchmark average (64.3) exceeded the no-skill baseline of the larger GPT-5.4 (59.7), and GPT-5.4-nano (57.4) exceeded the baseline of a larger model variant.

Sources
  1. 01Microsoft ResearchSkillOpt: Agent skills as trainable parameters
Also on Agents

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.