Tools · Apr 23, 2026

OpenAI describes WebSocket optimization for agent API performance

OpenAI's Responses API now supports WebSocket connections with connection-scoped caching to reduce latency in multi-turn agent workflows.

Trust79

HypeLow hype

1 source · single source

ShareX LinkedIn Email

TL;DR

OpenAI published technical guidance on using WebSockets in its Responses API to speed up agentic workflows
The approach involves connection-scoped caching to reduce API overhead and improve model latency
The post documents a case study of the Codex agent loop and how WebSocket optimization improved its performance

OpenAI published technical documentation on a WebSocket-based optimization for its Responses API, targeting developers who build multi-turn agent systems. The approach introduces connection-scoped caching, designed to reduce redundant API calls and improve latency in agent loops that make sequential requests to the model.

The company used the Codex agent loop as a case study to demonstrate the optimization's practical impact. While the full technical details were not accessible, the summary indicates the optimization targets a common performance bottleneck in agentic systems: the overhead of repeated API round-trips.

This addition to the Responses API reflects a broader trend of production infrastructure refinements for agent deployment, as developers move beyond single-turn interactions into complex, iterative workflows.

Sources

01OpenAI — News — Speeding up agentic workflows with WebSockets in the Responses API

Also on Tools

OpenAI describes WebSocket optimization for agent API performance

NVIDIA Nemotron 3 Nano Omni adds audio and video capabilities to multimodal AI model

Hugging Face releases Transformers v5.7.0 with new model support and bug fixes

OpenAI and Microsoft renegotiate deal, resolving exclusivity dispute with Amazon partnership