Notion Details Five Rebuilds of Custom Agents, Architecture for Enterprise AI Workflows
Notion cofounder and AI lead discuss iterative development of knowledge work agents, tool composition patterns, and pricing models for agentic features now in production.
1 source · cross-referenced
- Notion Custom Agents required four to five complete rebuilds before production readiness, driven by early limitations in model tool-calling, context windows, and reliability.
- The product uses progressive tool disclosure, shared databases as memory primitives, and manager agents that supervise specialized agents for complex workflows.
- Notion employs dedicated Model Behavior Engineers who write evals intentionally designed to fail ~30% of the time to identify capability frontiers rather than confirm current performance.
- The company prices Custom Agents via credits abstracted over tokens, model type, and serving tier, with auto-selection logic to match model capabilities to task requirements.
- Notion prioritizes retrieval and ranking optimization as search shifts from human-driven to agent-driven queries across meeting transcripts and collaboration data.
Notion has shipped Custom Agents as a production feature after multiple complete rebuilds of the underlying system. In a podcast interview, Notion's cofounder and head of AI discussed why early attempts failed: no standardized tool-calling protocols existed, context windows were too short, model outputs were unreliable, and the product exposed too much complexity directly to the model layer. Each major revision addressed a different constraint rather than incremental fixes to a single architecture.
The final design borrows from what Notion calls the 'Agent Lab' thesis—recognizing that successful agentic systems require understanding how teams collaborate, not merely wrapping a frontier model with tool access. The product implements agents that compose through shared databases as memory primitives, with 'manager agents' orchestrating dozens of specialized agents for tasks like email triage, data enrichment via web search, and structured database writes. Agents can also configure themselves, inspect their own failures, and edit instructions within permission guardrails.
Notion employs a dedicated role, Model Behavior Engineer, distinct from traditional software engineers. These roles focus on eval writing, failure analysis, and understanding model capabilities. The company uses three types of evals: regression tests to catch regressions, launch-quality evals for gate decisions, and 'frontier/headroom' evals designed to pass only ~30% of the time—intentionally revealing where model capabilities are still developing rather than confirming existing performance.
Pricing for Custom Agents uses credits as an abstraction over tokens, model type, serving tier, and web search costs, with future charges anticipated for sandbox execution. Rather than pure token-level usage-based pricing, Notion includes auto-selection logic that matches the appropriate model to the task at hand. The company decided against training its own foundation model, instead focusing on retrieval and ranking optimization as agent-driven queries increasingly replace human search across meeting transcripts and collaboration data.
The product architecture distinguishes between CLI-based tool integration and Model Context Protocol (MCP). Notion's leaders noted that CLIs offer better self-debugging behavior and determinism, while MCP remains useful for specific capability and permissioning scenarios. Internal tool definitions evolved from JavaScript and custom XML to Markdown and SQL-like abstractions, with progressive disclosure limiting what agents can access until explicitly needed, and system prompts kept deliberately short.
- Apr 24, 2026 · Latent Space — swyx
Shopify's Mikhail Parakhin discusses internal AI tooling, SimGym customer simulation, and deployment bottlenecks
Trust68 - Apr 21, 2026 · TechCrunch
AI research startup NeoCognition raises $40M seed to develop self-learning domain agents
Trust43 - Apr 19, 2026 · One Useful Thing
The AI landscape has shifted from chatbots to agents—here's how to choose what to use
Trust52