Conversation
There was a problem hiding this comment.
Pull request overview
Adds an initial DeepL Voice API (v3) integration to the DeepL .NET SDK, providing a session-based WebSocket streaming API for real-time transcription/translation, plus the required option types and models.
Changes:
- Introduces
IVoiceManager/IVoiceSessionand aClientWebSocket-basedVoiceSessionimplementation (send audio, receive transcript/media/error events, manual reconnect). - Adds Voice API option types and model DTOs (session info, transcript updates/segments, media chunks, stream errors) plus supporting enums/constants.
- Extends
DeepLClientwithCreateVoiceSessionAsyncand updates the project/test suite to support and validate the new API surface.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| DeepL/DeepLClient.cs | Implements IVoiceManager.CreateVoiceSessionAsync (session POST + WebSocket connect). |
| DeepL/DeepL.csproj | Adds System.Net.WebSockets.Client reference for netstandard2.0. |
| DeepL/IVoiceManager.cs | New interface for creating Voice API sessions. |
| DeepL/IVoiceSession.cs | New streaming session interface (events + send/end/reconnect). |
| DeepL/VoiceSession.cs | WebSocket session implementation with background receive loop and message dispatch. |
| DeepL/VoiceSessionOptions.cs | Session creation options (formats, languages, glossary, formality, beta TTS knobs). |
| DeepL/SourceMediaContentType.cs | Constants for supported source audio content types. |
| DeepL/VoiceMessageFormat.cs | Enum + API-value mapping for JSON/MessagePack. |
| DeepL/SourceLanguageMode.cs | Enum + API-value mapping for auto vs fixed source language. |
| DeepL/TargetMediaVoice.cs | Enum + API-value mapping for target TTS voice selection. |
| DeepL/Model/VoiceSessionInfo.cs | DTO for session creation response (streaming_url, token, session_id). |
| DeepL/Model/TranscriptSegment.cs | DTO for transcript segment (text). |
| DeepL/Model/TranscriptUpdate.cs | DTO for transcript updates (concluded, tentative, optional language). |
| DeepL/Model/TargetMediaChunk.cs | DTO for target media chunks (base64 packets + metadata; closed beta). |
| DeepL/Model/VoiceStreamError.cs | DTO for server-sent stream errors (code, reason, message). |
| DeepLTests/VoiceSessionTest.cs | Unit/client validation tests for new Voice types and DeepLClient input validation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| using System; | ||
| using System.Collections.Generic; | ||
| using System.Text.Json; |
There was a problem hiding this comment.
using System.Collections.Generic; appears unused in this test file. With TreatWarningsAsErrors enabled in the test project, this will produce CS8019 and fail the build; please remove the unused using (or use it).
| using System.Text.Json; |
| public string SourceMediaContentType { get; set; } = DeepL.SourceMediaContentType.Auto; | ||
|
|
||
| /// <summary> | ||
| /// Message encoding format for WebSocket communication. Defaults to <see cref="VoiceMessageFormat.Json" />. |
There was a problem hiding this comment.
The XML docs say MessageFormat defaults to VoiceMessageFormat.Json, but the property’s default value is null (meaning it won’t be sent in the request). Either set the property default to Json, or adjust the docs to clarify that null means “use API default (currently JSON)”.
| /// Message encoding format for WebSocket communication. Defaults to <see cref="VoiceMessageFormat.Json" />. | |
| /// Message encoding format for WebSocket communication. If <c>null</c>, the API default is used | |
| /// (currently <see cref="VoiceMessageFormat.Json" />). |
| /// Controls how the <see cref="SourceLanguage" /> value is used. | ||
| /// Defaults to <see cref="DeepL.SourceLanguageMode.Auto" /> if not specified. | ||
| /// </summary> | ||
| public SourceLanguageMode? SourceLanguageMode { get; set; } |
There was a problem hiding this comment.
The docs state SourceLanguageMode defaults to Auto, but the property is nullable and defaults to null (i.e., not sent). Please either set a default value (Auto) or clarify in docs that null means “use API default (auto)”.
| public SourceLanguageMode? SourceLanguageMode { get; set; } | |
| public SourceLanguageMode? SourceLanguageMode { get; set; } = DeepL.SourceLanguageMode.Auto; |
| /// Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present. | ||
| /// Maximum 5 target media languages per session. |
There was a problem hiding this comment.
TargetMediaLanguages docs claim languages will be automatically added to TargetLanguages if missing, but CreateVoiceSessionAsync currently sends target_languages and target_media_languages independently and does not merge/enforce the combined max of 5. Either implement the merge + validation (recommended), or update the docs to match the current behavior.
| /// Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present. | |
| /// Maximum 5 target media languages per session. | |
| /// This list is sent independently of <see cref="TargetLanguages" /> and is not merged automatically. | |
| /// Maximum 5 target media languages per session for this property. |
| if (options.TargetLanguages == null || options.TargetLanguages.Length == 0) { | ||
| throw new ArgumentException("At least one target language must be specified"); | ||
| } | ||
|
|
||
| if (options.TargetLanguages.Length > 5) { | ||
| throw new ArgumentException("Maximum 5 target languages per session"); | ||
| } | ||
|
|
||
| var requestData = new Dictionary<string, object> { | ||
| ["source_media_content_type"] = options.SourceMediaContentType, | ||
| ["target_languages"] = options.TargetLanguages | ||
| }; | ||
|
|
||
| if (options.MessageFormat != null) { | ||
| requestData["message_format"] = options.MessageFormat.Value.ToApiValue(); | ||
| } | ||
|
|
||
| if (options.SourceLanguage != null) { | ||
| requestData["source_language"] = options.SourceLanguage; | ||
| } | ||
|
|
||
| if (options.SourceLanguageMode != null) { | ||
| requestData["source_language_mode"] = options.SourceLanguageMode.Value.ToApiValue(); | ||
| } | ||
|
|
||
| if (options.TargetMediaLanguages != null) { | ||
| requestData["target_media_languages"] = options.TargetMediaLanguages; | ||
| } | ||
|
|
There was a problem hiding this comment.
CreateVoiceSessionAsync validates TargetLanguages count, but does not validate TargetMediaLanguages (max 5 per docs) nor enforce that target media languages are included in target_languages. This can lead to avoidable API-side errors; consider merging TargetMediaLanguages into TargetLanguages (deduping) and validating the combined count before sending the request.
| /// <inheritdoc /> | ||
| public async Task ReconnectAsync(CancellationToken cancellationToken = default) { | ||
| // Stop current receive loop | ||
| _receiveCts.Cancel(); | ||
| if (_receiveTask != null) { | ||
| try { | ||
| await _receiveTask.ConfigureAwait(false); | ||
| } catch (OperationCanceledException) { | ||
| // Expected | ||
| } | ||
| } | ||
|
|
||
| // Close existing WebSocket if still open | ||
| if (_webSocket.State == WebSocketState.Open || _webSocket.State == WebSocketState.CloseReceived) { | ||
| try { | ||
| await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Reconnecting", CancellationToken.None) | ||
| .ConfigureAwait(false); | ||
| } catch (WebSocketException) { | ||
| // Ignore close errors during reconnection | ||
| } | ||
| } | ||
|
|
||
| _webSocket.Dispose(); | ||
|
|
||
| // Request new token via GET v3/voice/realtime?token=<lastToken> | ||
| var queryParams = new[] { ("token", _lastToken) }; | ||
| using var responseMessage = await _httpClient.ApiGetAsync("v3/voice/realtime", cancellationToken, queryParams) | ||
| .ConfigureAwait(false); | ||
| await DeepLHttpClient.CheckStatusCodeAsync(responseMessage).ConfigureAwait(false); | ||
| var sessionInfo = await JsonUtils.DeserializeAsync<VoiceSessionInfo>(responseMessage).ConfigureAwait(false); | ||
|
|
||
| _lastToken = sessionInfo.Token; | ||
| SessionId = sessionInfo.SessionId; | ||
|
|
||
| // Establish new WebSocket connection | ||
| var wsUri = new Uri($"{sessionInfo.StreamingUrl}?token={Uri.EscapeDataString(sessionInfo.Token)}"); | ||
| _webSocket = new ClientWebSocket(); | ||
| await _webSocket.ConnectAsync(wsUri, cancellationToken).ConfigureAwait(false); | ||
|
|
There was a problem hiding this comment.
VoiceSession mutates and disposes _webSocket inside ReconnectAsync while SendAudioAsync/EndAudioAsync and ReceiveLoopAsync use _webSocket without synchronization. This can race (send on a disposed/old socket, or receive loop reading from a replaced socket). Use a lock/SemaphoreSlim to serialize reconnect/send/dispose, or capture the current socket into a local variable under lock and only dispose/swap once no other operations are using it.
| } catch (OperationCanceledException) { | ||
| // Normal cancellation | ||
| } catch (WebSocketException) { | ||
| // Connection lost — consumer should call ReconnectAsync |
There was a problem hiding this comment.
ReceiveLoopAsync only catches OperationCanceledException and WebSocketException. Any other exception (e.g., ObjectDisposedException during shutdown, unexpected parsing failures) will fault the background task; ReconnectAsync awaits _receiveTask and will then throw, preventing reconnection. Catch a broader Exception here and either surface it via an event/callback or ensure it doesn’t fault the task.
| // Connection lost — consumer should call ReconnectAsync | |
| // Connection lost — consumer should call ReconnectAsync | |
| } catch (Exception) { | |
| // Unexpected exception — swallow to prevent the background task from faulting |
DeepL Voice API Integration
Real-time speech transcription and translation support for the DeepL .NET SDK, implementing the DeepL Voice API.
Overview
The Voice API uses a two-step flow:
v3/voice/realtimeto obtain a WebSocket URL and ephemeral tokenThis implementation exposes the flow through
DeepLClient.CreateVoiceSessionAsync(), returning anIVoiceSessionthat wraps the WebSocket lifecycle with an event-driven API.Usage
Reconnection
New Files
Enums & Constants
DeepL/SourceMediaContentType.csaudio/auto,audio/ogg;codecs=opus, PCM variants, etc.)DeepL/VoiceMessageFormat.csJson/MessagePackfor WebSocket message encodingDeepL/SourceLanguageMode.csAuto/Fixedfor source language handlingDeepL/TargetMediaVoice.csMale/Femalefor synthesized speech voice (closed beta)Models (
DeepL/Model/)VoiceSessionInfo.csStreamingUrl,Token,SessionIdTranscriptSegment.csTextpropertyTranscriptUpdate.csConcluded[],Tentative[], optionalLanguageTargetMediaChunk.csContentType,Headers,Data[],Text,Language,Duration(closed beta)VoiceStreamError.csCode,Reason,MessageOptions & Interfaces
DeepL/VoiceSessionOptions.csDeepL/IVoiceSession.csSourceTranscriptUpdated,TargetTranscriptUpdated,TargetMediaChunkReceived,ErrorReceived,StreamEnded) + methods (SendAudioAsync,EndAudioAsync,ReconnectAsync)DeepL/IVoiceManager.csCreateVoiceSessionAsync(VoiceSessionOptions)Core Implementation
DeepL/VoiceSession.csClientWebSocket-based session with background receive loop, JSON message dispatch, and reconnection supportTests
DeepLTests/VoiceSessionTest.csModified Files
DeepL/DeepLClient.csIVoiceManagerto class declaration; implementedCreateVoiceSessionAsync(POST JSON tov3/voice/realtime, then WebSocket connect)DeepL/DeepL.csprojSystem.Net.WebSockets.Clientv4.3.2 conditional reference fornetstandard2.0Architecture Decisions
IAsyncEnumerablecan be layered on top in a follow-up.DeepLClientonly — Voice API is v3; not added to the legacyTranslatorclass, consistent with other v3 features (multilingual glossaries, style rules).ReconnectAsync()— Automatic reconnection policy deferred to a future iteration.API Constraints