feat(openai): emit interim input audio transcription deltas from Realtime API by F1nnM · Pull Request #5544 · livekit/agents

F1nnM · 2026-04-24T09:29:02Z

Summary

The OpenAI Realtime API sends conversation.item.input_audio_transcription.delta events with
streaming transcription text as the user speaks, but the plugin currently drops them with pass.
This PR handles those deltas by accumulating text per item and emitting
input_audio_transcription_completed with is_final=False, matching the pattern already used
by the Google Gemini realtime plugin.

The .completed event continues to emit is_final=True as before, and cleans up accumulated
state.

Accumulate delta text per item_id in RealtimeSession._input_transcriptions
Emit InputTranscriptionCompleted(is_final=False) on each delta
Clean up state on .completed
Fix existing test_input_audio_transcription to wait for is_final=True (interims now fire first)
Add test_input_audio_transcription_interim asserting interim events arrive before final

Context

The same plugin's openai.STT class (in stt.py) already handles these deltas via its own
WebSocket connection. This brings parity to the RealtimeModel path so that AgentSession's
user_input_transcribed event receives streaming transcripts in realtime mode.

Test plan

make check passes (format, lint, type-check)
All 18 existing OpenAI realtime tests pass
New test_input_audio_transcription_interim validates interim deltas arrive before final

Handle `conversation.item.input_audio_transcription.delta` events from the OpenAI Realtime API instead of dropping them. Deltas are accumulated per item and emitted as `input_audio_transcription_completed` with `is_final=False`, matching the pattern already used by the Gemini realtime plugin. The completed event cleans up state and continues to emit `is_final=True` as before. This enables streaming user input transcription for applications that subscribe to `user_input_transcribed` on `AgentSession`.

The existing test_input_audio_transcription now waits for is_final=True before asserting, since interim deltas now fire first. New test_input_audio_transcription_interim validates that interim transcription events (is_final=False) arrive before the final transcript.

F1nnM added 2 commits April 24, 2026 11:05

This comment was marked as resolved.

Sign in to view

fix(openai): clean up delta state on transcription failure

dfcec3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai): emit interim input audio transcription deltas from Realtime API#5544

feat(openai): emit interim input audio transcription deltas from Realtime API#5544
F1nnM wants to merge 3 commits intolivekit:mainfrom
F1nnM:feat/openai-realtime-streaming-input-transcription

F1nnM commented Apr 24, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

F1nnM commented Apr 24, 2026

Summary

Context

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant