Skip to content
Models · May 9, 2026

OpenAI releases GPT-Realtime-2 voice model with expanded reasoning and 128K context window

Three new streaming audio models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—are now available via the Realtime API, with independent benchmarks showing instruction retention improvements and faster response times.

Trust69
HypeSome hype

3 sources · cross-referenced

ShareXLinkedInEmail
TL;DR
  • OpenAI launched GPT-Realtime-2, positioning it as a voice-to-speech model with 'GPT-5-class reasoning' capable of tool use, interruption recovery, and longer conversations via expanded 128K context window.
  • Two companion models release simultaneously: GPT-Realtime-Translate for streaming translation across 70+ input languages into 13 outputs, and GPT-Realtime-Whisper for low-latency streaming transcription.
  • Scale AI reported GPT-Realtime-2 achieved top ranking on its Audio MultiChallenge leaderboard with instruction retention rising from 36.7% to 70.8%, while Artificial Analysis measured 96.6% on Big Bench Audio speech-to-speech reasoning.
  • Adjustable reasoning effort levels (minimal through xhigh) allow developers to trade off response latency; minimal reasoning achieves 1.12s time-to-first-audio versus 2.33s at high reasoning.
  • Early adopters Glean and Genspark reported 42.9% and 26% relative improvements in helpfulness and conversation effectiveness respectively in internal evaluations.

OpenAI has released three new streaming audio models integrated into its Realtime API. GPT-Realtime-2 is positioned as a native speech-to-speech model designed for production voice agents that can reason during conversation, invoke multiple tools concurrently, recover gracefully from user interruptions, and maintain longer dialogue sessions. The context window has expanded from 32K to 128K tokens, supporting more complex conversational histories.

Two companion models address adjacent use cases. GPT-Realtime-Translate enables streaming translation from over 70 input languages into 13 output languages in real time. GPT-Realtime-Whisper provides low-latency streaming transcription for captions, note-taking, and continuous speech understanding. All three models are available immediately via the Realtime API; a ChatGPT voice upgrade incorporating these improvements remains pending.

Independent evaluators have published performance metrics. Scale AI's Audio MultiChallenge leaderboard ranks GPT-Realtime-2 first overall, with instruction retention improving from 36.7% to 70.8% against the prior GPT-Realtime-1.5 model. Artificial Analysis measured 96.6% accuracy on Big Bench Audio speech-to-speech reasoning and 96.1% on its Conversational Dynamics benchmark. Time-to-first-audio ranges from 1.12 seconds at minimal reasoning effort to 2.33 seconds at high reasoning effort. Audio pricing remains unchanged at $1.15 per hour for input and $4.61 per hour for output.

Developers can now control reasoning intensity across five adjustable levels—minimal, low, medium, high, and extra-high—with low set as the default. The model supports preambles that precede main responses (e.g., 'let me check that') and audible tool transparency during execution (e.g., 'checking your calendar now'), designed to sustain user engagement while the model processes requests. Early enterprise adopters report measurable gains: Glean observed a 42.9% relative improvement in helpfulness in internal evaluations of real-time organizational voice interactions, while Genspark reported a 26% increase in effective conversation completion rate for its Call for Me Agent.

Sources
  1. 01Latent Space — swyx[AINews] GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs
  2. 02Artificial AnalysisGPT-Realtime-2 Benchmark Report
  3. 03Scale AI LabsAudio MultiChallenge S2S Leaderboard
Also on Models

Stories may contain errors. Dispatch is assembled with AI assistance and curated by human editors; despite the trust-score filter, mistakes happen. We correct publicly — every article links to its revision history. Nothing here is financial, legal, or medical advice. Verify before relying on any claim.

© 2026 Dispatch. No ads. No sponsorships. No paid placement. Reader-supported via Ko-fi.

Built by a person who cares about honest AI news.