GPT realtime API for speech and audio

Azure OpenAI's GPT Realtime API delivers low-latency, speech-in/speech-out conversational capabilities and can be used via WebRTC for client apps or WebSocket for server-to-server scenarios. This article covers supported models, authentication options, deployment steps in the Azure Artificial Intelligence Foundry portal, and example client code in JavaScript, Python, and TypeScript.

Azure OpenAI’s GPT Realtime API supports interactive, low-latency ‘speech in, speech out’ conversations and is part of the GPT-4o model family. You can stream audio to the model and receive audio responses in real time via WebRTC or WebSocket. The documentation recommends WebRTC for client-side applications such as web and mobile apps because it is designed for low-latency audio streaming, while WebSocket is suggested for server-to-server scenarios where extreme low latency is not required.

The article lists supported realtime models and recommends API versions: gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview (both version 2024-12-17), gpt-realtime (version 2025-08-28), and gpt-realtime-mini (version 2025-10-06). It notes that Realtime API support was first added in API version 2024-10-01-preview (retired) and recommends using the generally available API version 2025-08-28 when possible. To deploy a model, follow the Azure Artificial Intelligence Foundry portal workflow: create or select a project, open Models + endpoints under My assets, choose Deploy model > Deploy base model, select gpt-realtime, and complete the deployment wizard.

The guide covers prerequisites and authentication options. You need an Azure subscription, a deployed gpt-realtime or gpt-realtime-mini model, and Node.js or Python/TypeScript environments depending on sample code. Microsoft Entra ID keyless authentication is recommended; it requires Azure CLI and assignment of the Cognitive Services User role. The article also describes API-key authentication and cautions about storing keys securely. Example session configuration shows audio input and output settings (transcription with whisper-1, audio/pcm at 24000 Hz, server_vad turn detection, and output voice ‘alloy’). Client samples for JavaScript, Python, and TypeScript demonstrate event handling for session.created, session.updated, response.output_audio.delta, response.output_audio_transcript.delta, and response.done, and include example console output of transcript deltas and audio chunk sizes to illustrate real-time interaction patterns.

56

Impact Score

CSEM France pushes responsible Artificial Intelligence

CSEM France is positioning itself as a key force in France’s push for responsible Artificial Intelligence, combining technical research with ethics, policy engagement, and industry partnerships. Its work centers on trustworthy systems designed for transparency, fairness, and public accountability.

Eu parliament backs ban on Artificial Intelligence nudifier apps

European parliament committees have endorsed changes to the Artificial Intelligence Act that would ban apps used to create non-consensual nude or sexually explicit images of real people. Lawmakers also backed delays and targeted adjustments to compliance rules for high-risk systems and watermarking requirements.

Chancellor sets principles for UK-EU alignment

Rachel Reeves has outlined a growth plan built around closer UK-EU ties, faster Artificial Intelligence adoption, and stronger regional development. The strategy sets new principles for regulatory alignment, expands support for innovation, and shifts more investment power to city regions.

Nvidia denies report on Groq chip plans for China

Nvidia says a report that it is preparing Groq inferencing chips for shipment to China is “totally false,” even as interest in H200 sales to the country remains strong. The dispute highlights how closely watched Nvidia’s China strategy has become across training and inferencing hardware.

Contact Us

Got questions? Use the form to contact us.

Contact Form

Clicking next sends a verification code to your email. After verifying, you can enter your message.