OpenAI describes WebSocket optimization for agent API performance
OpenAI's Responses API now supports WebSocket connections with connection-scoped caching to reduce latency in multi-turn agent workflows.
1 source · single source
- OpenAI published technical guidance on using WebSockets in its Responses API to speed up agentic workflows
- The approach involves connection-scoped caching to reduce API overhead and improve model latency
- The post documents a case study of the Codex agent loop and how WebSocket optimization improved its performance
OpenAI published technical documentation on a WebSocket-based optimization for its Responses API, targeting developers who build multi-turn agent systems. The approach introduces connection-scoped caching, designed to reduce redundant API calls and improve latency in agent loops that make sequential requests to the model.
The company used the Codex agent loop as a case study to demonstrate the optimization's practical impact. While the full technical details were not accessible, the summary indicates the optimization targets a common performance bottleneck in agentic systems: the overhead of repeated API round-trips.
This addition to the Responses API reflects a broader trend of production infrastructure refinements for agent deployment, as developers move beyond single-turn interactions into complex, iterative workflows.
- Apr 28, 2026 · Hugging Face / NVIDIA
NVIDIA Nemotron 3 Nano Omni adds audio and video capabilities to multimodal AI model
Trust63 - Apr 28, 2026 · GitHub · huggingface/transformers releases
Hugging Face releases Transformers v5.7.0 with new model support and bug fixes
Trust79 - Apr 28, 2026 · TechCrunch — AI
OpenAI and Microsoft renegotiate deal, resolving exclusivity dispute with Amazon partnership
Trust54