Skip to content
Agents · Apr 19, 2026

Notion Details Five Rebuilds of Custom Agents, Architecture for Enterprise AI Workflows

Notion cofounder and AI lead discuss iterative development of knowledge work agents, tool composition patterns, and pricing models for agentic features now in production.

Trust59
HypeSome hype

1 source · cross-referenced

ShareXLinkedInEmail
TL;DR
  • Notion Custom Agents required four to five complete rebuilds before production readiness, driven by early limitations in model tool-calling, context windows, and reliability.
  • The product uses progressive tool disclosure, shared databases as memory primitives, and manager agents that supervise specialized agents for complex workflows.
  • Notion employs dedicated Model Behavior Engineers who write evals intentionally designed to fail ~30% of the time to identify capability frontiers rather than confirm current performance.
  • The company prices Custom Agents via credits abstracted over tokens, model type, and serving tier, with auto-selection logic to match model capabilities to task requirements.
  • Notion prioritizes retrieval and ranking optimization as search shifts from human-driven to agent-driven queries across meeting transcripts and collaboration data.

Notion has shipped Custom Agents as a production feature after multiple complete rebuilds of the underlying system. In a podcast interview, Notion's cofounder and head of AI discussed why early attempts failed: no standardized tool-calling protocols existed, context windows were too short, model outputs were unreliable, and the product exposed too much complexity directly to the model layer. Each major revision addressed a different constraint rather than incremental fixes to a single architecture.

The final design borrows from what Notion calls the 'Agent Lab' thesis—recognizing that successful agentic systems require understanding how teams collaborate, not merely wrapping a frontier model with tool access. The product implements agents that compose through shared databases as memory primitives, with 'manager agents' orchestrating dozens of specialized agents for tasks like email triage, data enrichment via web search, and structured database writes. Agents can also configure themselves, inspect their own failures, and edit instructions within permission guardrails.

Notion employs a dedicated role, Model Behavior Engineer, distinct from traditional software engineers. These roles focus on eval writing, failure analysis, and understanding model capabilities. The company uses three types of evals: regression tests to catch regressions, launch-quality evals for gate decisions, and 'frontier/headroom' evals designed to pass only ~30% of the time—intentionally revealing where model capabilities are still developing rather than confirming existing performance.

Pricing for Custom Agents uses credits as an abstraction over tokens, model type, serving tier, and web search costs, with future charges anticipated for sandbox execution. Rather than pure token-level usage-based pricing, Notion includes auto-selection logic that matches the appropriate model to the task at hand. The company decided against training its own foundation model, instead focusing on retrieval and ranking optimization as agent-driven queries increasingly replace human search across meeting transcripts and collaboration data.

The product architecture distinguishes between CLI-based tool integration and Model Context Protocol (MCP). Notion's leaders noted that CLIs offer better self-debugging behavior and determinism, while MCP remains useful for specific capability and permissioning scenarios. Internal tool definitions evolved from JavaScript and custom XML to Markdown and SQL-like abstractions, with progressive disclosure limiting what agents can access until explicitly needed, and system prompts kept deliberately short.

Sources
  1. 01Latent Space — swyxNotion's Token Town: 5 Rebuilds, 100+ Tools, MCP vs CLIs and the Software Factory Future
Also on Agents

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.