Research · Jun 16, 2026

Google DeepMind releases Gemini 3.5 Live Translate for near real-time speech-to-speech translation in over 70 languages

The new audio model integrates with Google AI Studio, Google Translate, and Google Meet, enabling fluid, low-latency translation while preserving speaker intonation and handling noisy environments.

Trust79

HypeLow hype

1 source · cross-referenced

ShareX LinkedIn Email

TL;DR

Gemini 3.5 Live Translate is a new audio model from Google DeepMind that delivers near real-time speech-to-speech translation in over 70 languages.

Google DeepMind announced the release of Gemini 3.5 Live Translate, a new audio model designed for near real-time speech-to-speech translation in over 70 languages. The model automatically detects languages and generates translated speech that preserves the speaker’s intonation, pacing, and pitch, aiming to reduce awkward pauses and maintain synchronization with the speaker.

Unlike turn-based translation systems that wait for the speaker to finish before responding, Gemini 3.5 Live Translate processes speech continuously, balancing the need for context with the demand for immediacy. The model is reported to stay just a few seconds behind the speaker throughout a session, delivering fluid audio output.

The model is rolling out across multiple Google products: developers can access it in public preview via the Gemini Live API and Google AI Studio; enterprises can preview it this month in Google Meet; and it is available to all users via the Google Translate app on Android and iOS.

Gemini 3.5 Live Translate is designed to handle multilingual inputs without manual configuration and includes noise robustness features to function in loud or unpredictable environments. Google highlights potential use cases such as live interpretation for multilingual calls, meetings, lessons, and broadcasts.

Google also notes partnerships with companies like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents, which are integrating the model to build voice translation applications. These partners manage real-time media streaming infrastructure, allowing developers to focus on user experience.

Early adopters, including Grab, CJ ENM, and LiveKit, have provided positive feedback on the model’s translation quality, accuracy, and low latency. Grab, which facilitates over 10 million voice calls per month, is testing the model to enable near real-time multilingual communication between drivers and travelers.

In Google Meet, the update expands language support from five to over 70 languages and increases the number of supported language combinations from English-only to over 2,000 combinations. The interface is also updated to provide instant access to speech translation, with a private preview for select Google Workspace customers starting this month and a broader rollout planned later this year.

Sources

01Google DeepMind — Blog — Fluid, natural voice translation with Gemini 3.5 Live Translate

Also on Research

Google DeepMind releases Gemini 3.5 Live Translate for near real-time speech-to-speech translation in over 70 languages

Study finds objective misalignment undermines LLM multi-agent systems in adversarial settings

RL fine-tuning yields more structured internal representations than SFT for mathematical reasoning, study finds

ClinLens benchmark exposes gap between executable and correct clinical coding agents