Draft for a DeepL Voice implementation by DeeJayTC · Pull Request #75 · DeepLcom/deepl-dotnet

DeeJayTC · 2026-03-31T09:38:09Z

DeepL Voice API Integration

Real-time speech transcription and translation support for the DeepL .NET SDK, implementing the DeepL Voice API.

Overview

The Voice API uses a two-step flow:

Request a session — POST to v3/voice/realtime to obtain a WebSocket URL and ephemeral token
Stream via WebSocket — Send audio chunks, receive real-time transcripts and translations

This implementation exposes the flow through DeepLClient.CreateVoiceSessionAsync(), returning an IVoiceSession that wraps the WebSocket lifecycle with an event-driven API.

Usage

using var client = new DeepLClient("your-auth-key");

var session = await client.CreateVoiceSessionAsync(new VoiceSessionOptions {
    SourceMediaContentType = SourceMediaContentType.PcmS16le16000,
    SourceLanguage = "en",
    SourceLanguageMode = SourceLanguageMode.Fixed,
    TargetLanguages = new[] { "de", "fr" },
    Formality = "formal"
});

session.SourceTranscriptUpdated += (s, update) => {
    var text = string.Join("", Array.ConvertAll(update.Concluded, seg => seg.Text));
    Console.WriteLine($"[Source] {text}");
};

session.TargetTranscriptUpdated += (s, update) => {
    var text = string.Join("", Array.ConvertAll(update.Concluded, seg => seg.Text));
    Console.WriteLine($"[{update.Language}] {text}");
};

session.StreamEnded += (s, e) => Console.WriteLine("Stream complete");

// Stream audio chunks (50-250ms recommended)
await session.SendAudioAsync(audioChunkBytes);
// ... more chunks ...

// Signal end of audio to finalize transcripts
await session.EndAudioAsync();

Reconnection

// If the WebSocket disconnects unexpectedly:
await session.ReconnectAsync();
// Resume sending audio — the session continues where it left off

New Files

Enums & Constants

File	Description
`DeepL/SourceMediaContentType.cs`	String constants for all supported audio formats (`audio/auto`, `audio/ogg;codecs=opus`, PCM variants, etc.)
`DeepL/VoiceMessageFormat.cs`	Enum: `Json` / `MessagePack` for WebSocket message encoding
`DeepL/SourceLanguageMode.cs`	Enum: `Auto` / `Fixed` for source language handling
`DeepL/TargetMediaVoice.cs`	Enum: `Male` / `Female` for synthesized speech voice (closed beta)

Models (`DeepL/Model/`)

File	Description
`VoiceSessionInfo.cs`	POST response: `StreamingUrl`, `Token`, `SessionId`
`TranscriptSegment.cs`	Single text segment with `Text` property
`TranscriptUpdate.cs`	Transcript update with `Concluded[]`, `Tentative[]`, optional `Language`
`TargetMediaChunk.cs`	Translated audio chunk: `ContentType`, `Headers`, `Data[]`, `Text`, `Language`, `Duration` (closed beta)
`VoiceStreamError.cs`	WebSocket error: `Code`, `Reason`, `Message`

Options & Interfaces

File	Description
`DeepL/VoiceSessionOptions.cs`	Session creation options: audio format, languages, formality, glossary, closed beta TTS settings
`DeepL/IVoiceSession.cs`	Session interface: events (`SourceTranscriptUpdated`, `TargetTranscriptUpdated`, `TargetMediaChunkReceived`, `ErrorReceived`, `StreamEnded`) + methods (`SendAudioAsync`, `EndAudioAsync`, `ReconnectAsync`)
`DeepL/IVoiceManager.cs`	Factory interface: `CreateVoiceSessionAsync(VoiceSessionOptions)`

Core Implementation

File	Description
`DeepL/VoiceSession.cs`	Internal `ClientWebSocket`-based session with background receive loop, JSON message dispatch, and reconnection support

Tests

File	Description
`DeepLTests/VoiceSessionTest.cs`	11 unit tests for options defaults, enum API values, model deserialization, and client input validation

Modified Files

File	Change
`DeepL/DeepLClient.cs`	Added `IVoiceManager` to class declaration; implemented `CreateVoiceSessionAsync` (POST JSON to `v3/voice/realtime`, then WebSocket connect)
`DeepL/DeepL.csproj`	Added `System.Net.WebSockets.Client` v4.3.2 conditional reference for `netstandard2.0`

Architecture Decisions

Event-based API — Multiple concurrent streams (source transcript, target transcript, target media) arrive on the same WebSocket, mapping naturally to C# events. IAsyncEnumerable can be layered on top in a follow-up.
JSON only for v1 — MessagePack support deferred (requires additional NuGet dependency).
DeepLClient only — Voice API is v3; not added to the legacy Translator class, consistent with other v3 features (multilingual glossaries, style rules).
Manual ReconnectAsync() — Automatic reconnection policy deferred to a future iteration.
Thread safety — Events fire on a background thread. Consumers handle synchronization context marshaling (standard .NET pattern).

API Constraints

Maximum 5 target languages per session
Audio chunk size ≤ 100 KB or 1 second duration
Recommended chunk duration: 50–250 ms
Max audio stream speed: 2× real-time
30-second inactivity timeout
1-hour max connection duration (reconnect to continue)
Tokens are single-use; reuse terminates the session

Copilot

Pull request overview

Adds an initial DeepL Voice API (v3) integration to the DeepL .NET SDK, providing a session-based WebSocket streaming API for real-time transcription/translation, plus the required option types and models.

Changes:

Introduces IVoiceManager / IVoiceSession and a ClientWebSocket-based VoiceSession implementation (send audio, receive transcript/media/error events, manual reconnect).
Adds Voice API option types and model DTOs (session info, transcript updates/segments, media chunks, stream errors) plus supporting enums/constants.
Extends DeepLClient with CreateVoiceSessionAsync and updates the project/test suite to support and validate the new API surface.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
DeepL/DeepLClient.cs	Implements `IVoiceManager.CreateVoiceSessionAsync` (session POST + WebSocket connect).
DeepL/DeepL.csproj	Adds `System.Net.WebSockets.Client` reference for `netstandard2.0`.
DeepL/IVoiceManager.cs	New interface for creating Voice API sessions.
DeepL/IVoiceSession.cs	New streaming session interface (events + send/end/reconnect).
DeepL/VoiceSession.cs	WebSocket session implementation with background receive loop and message dispatch.
DeepL/VoiceSessionOptions.cs	Session creation options (formats, languages, glossary, formality, beta TTS knobs).
DeepL/SourceMediaContentType.cs	Constants for supported source audio content types.
DeepL/VoiceMessageFormat.cs	Enum + API-value mapping for JSON/MessagePack.
DeepL/SourceLanguageMode.cs	Enum + API-value mapping for auto vs fixed source language.
DeepL/TargetMediaVoice.cs	Enum + API-value mapping for target TTS voice selection.
DeepL/Model/VoiceSessionInfo.cs	DTO for session creation response (`streaming_url`, `token`, `session_id`).
DeepL/Model/TranscriptSegment.cs	DTO for transcript segment (`text`).
DeepL/Model/TranscriptUpdate.cs	DTO for transcript updates (`concluded`, `tentative`, optional `language`).
DeepL/Model/TargetMediaChunk.cs	DTO for target media chunks (base64 packets + metadata; closed beta).
DeepL/Model/VoiceStreamError.cs	DTO for server-sent stream errors (`code`, `reason`, `message`).
DeepLTests/VoiceSessionTest.cs	Unit/client validation tests for new Voice types and `DeepLClient` input validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T09:51:38Z

DeepLTests/VoiceSessionTest.cs

+
+using System;
+using System.Collections.Generic;
+using System.Text.Json;


using System.Collections.Generic; appears unused in this test file. With TreatWarningsAsErrors enabled in the test project, this will produce CS8019 and fail the build; please remove the unused using (or use it).

Suggested change

using System.Text.Json;

Copilot · 2026-03-31T09:51:38Z

DeepL/VoiceSessionOptions.cs

+    public string SourceMediaContentType { get; set; } = DeepL.SourceMediaContentType.Auto;
+
+    /// <summary>
+    ///   Message encoding format for WebSocket communication. Defaults to <see cref="VoiceMessageFormat.Json" />.


The XML docs say MessageFormat defaults to VoiceMessageFormat.Json, but the property’s default value is null (meaning it won’t be sent in the request). Either set the property default to Json, or adjust the docs to clarify that null means “use API default (currently JSON)”.

Suggested change

/// Message encoding format for WebSocket communication. Defaults to <see cref="VoiceMessageFormat.Json" />.

/// Message encoding format for WebSocket communication. If <c>null</c>, the API default is used

/// (currently <see cref="VoiceMessageFormat.Json" />).

Copilot · 2026-03-31T09:51:39Z

DeepL/VoiceSessionOptions.cs

+    ///   Controls how the <see cref="SourceLanguage" /> value is used.
+    ///   Defaults to <see cref="DeepL.SourceLanguageMode.Auto" /> if not specified.
+    /// </summary>
+    public SourceLanguageMode? SourceLanguageMode { get; set; }


The docs state SourceLanguageMode defaults to Auto, but the property is nullable and defaults to null (i.e., not sent). Please either set a default value (Auto) or clarify in docs that null means “use API default (auto)”.

Suggested change

public SourceLanguageMode? SourceLanguageMode { get; set; }

public SourceLanguageMode? SourceLanguageMode { get; set; } = DeepL.SourceLanguageMode.Auto;

Copilot · 2026-03-31T09:51:39Z

DeepL/VoiceSessionOptions.cs

+    ///   Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present.
+    ///   Maximum 5 target media languages per session.


TargetMediaLanguages docs claim languages will be automatically added to TargetLanguages if missing, but CreateVoiceSessionAsync currently sends target_languages and target_media_languages independently and does not merge/enforce the combined max of 5. Either implement the merge + validation (recommended), or update the docs to match the current behavior.

Suggested change

/// Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present.

/// Maximum 5 target media languages per session.

/// This list is sent independently of <see cref="TargetLanguages" /> and is not merged automatically.

/// Maximum 5 target media languages per session for this property.

Copilot · 2026-03-31T09:51:39Z

DeepL/DeepLClient.cs

+      if (options.TargetLanguages == null || options.TargetLanguages.Length == 0) {
+        throw new ArgumentException("At least one target language must be specified");
+      }
+
+      if (options.TargetLanguages.Length > 5) {
+        throw new ArgumentException("Maximum 5 target languages per session");
+      }
+
+      var requestData = new Dictionary<string, object> {
+        ["source_media_content_type"] = options.SourceMediaContentType,
+        ["target_languages"] = options.TargetLanguages
+      };
+
+      if (options.MessageFormat != null) {
+        requestData["message_format"] = options.MessageFormat.Value.ToApiValue();
+      }
+
+      if (options.SourceLanguage != null) {
+        requestData["source_language"] = options.SourceLanguage;
+      }
+
+      if (options.SourceLanguageMode != null) {
+        requestData["source_language_mode"] = options.SourceLanguageMode.Value.ToApiValue();
+      }
+
+      if (options.TargetMediaLanguages != null) {
+        requestData["target_media_languages"] = options.TargetMediaLanguages;
+      }
+


CreateVoiceSessionAsync validates TargetLanguages count, but does not validate TargetMediaLanguages (max 5 per docs) nor enforce that target media languages are included in target_languages. This can lead to avoidable API-side errors; consider merging TargetMediaLanguages into TargetLanguages (deduping) and validating the combined count before sending the request.

Copilot · 2026-03-31T09:51:39Z

DeepL/VoiceSession.cs

+    /// <inheritdoc />
+    public async Task ReconnectAsync(CancellationToken cancellationToken = default) {
+      // Stop current receive loop
+      _receiveCts.Cancel();
+      if (_receiveTask != null) {
+        try {
+          await _receiveTask.ConfigureAwait(false);
+        } catch (OperationCanceledException) {
+          // Expected
+        }
+      }
+
+      // Close existing WebSocket if still open
+      if (_webSocket.State == WebSocketState.Open || _webSocket.State == WebSocketState.CloseReceived) {
+        try {
+          await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Reconnecting", CancellationToken.None)
+                .ConfigureAwait(false);
+        } catch (WebSocketException) {
+          // Ignore close errors during reconnection
+        }
+      }
+
+      _webSocket.Dispose();
+
+      // Request new token via GET v3/voice/realtime?token=<lastToken>
+      var queryParams = new[] { ("token", _lastToken) };
+      using var responseMessage = await _httpClient.ApiGetAsync("v3/voice/realtime", cancellationToken, queryParams)
+            .ConfigureAwait(false);
+      await DeepLHttpClient.CheckStatusCodeAsync(responseMessage).ConfigureAwait(false);
+      var sessionInfo = await JsonUtils.DeserializeAsync<VoiceSessionInfo>(responseMessage).ConfigureAwait(false);
+
+      _lastToken = sessionInfo.Token;
+      SessionId = sessionInfo.SessionId;
+
+      // Establish new WebSocket connection
+      var wsUri = new Uri($"{sessionInfo.StreamingUrl}?token={Uri.EscapeDataString(sessionInfo.Token)}");
+      _webSocket = new ClientWebSocket();
+      await _webSocket.ConnectAsync(wsUri, cancellationToken).ConfigureAwait(false);
+


VoiceSession mutates and disposes _webSocket inside ReconnectAsync while SendAudioAsync/EndAudioAsync and ReceiveLoopAsync use _webSocket without synchronization. This can race (send on a disposed/old socket, or receive loop reading from a replaced socket). Use a lock/SemaphoreSlim to serialize reconnect/send/dispose, or capture the current socket into a local variable under lock and only dispose/swap once no other operations are using it.

Copilot · 2026-03-31T09:51:40Z

DeepL/VoiceSession.cs

+      } catch (OperationCanceledException) {
+        // Normal cancellation
+      } catch (WebSocketException) {
+        // Connection lost — consumer should call ReconnectAsync


ReceiveLoopAsync only catches OperationCanceledException and WebSocketException. Any other exception (e.g., ObjectDisposedException during shutdown, unexpected parsing failures) will fault the background task; ReconnectAsync awaits _receiveTask and will then throw, preventing reconnection. Catch a broader Exception here and either surface it via an event/callback or ensure it doesn’t fault the task.

Suggested change

// Connection lost — consumer should call ReconnectAsync

// Connection lost — consumer should call ReconnectAsync

} catch (Exception) {

// Unexpected exception — swallow to prevent the background task from faulting

Draft for a DeepL Voice implementation

5330a21

DeeJayTC requested a review from Copilot March 31, 2026 09:44

Copilot started reviewing on behalf of DeeJayTC March 31, 2026 09:45 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft for a DeepL Voice implementation#75

Draft for a DeepL Voice implementation#75
DeeJayTC wants to merge 1 commit intomainfrom
tc/add-voice

DeeJayTC commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/// Message encoding format for WebSocket communication. Defaults to <see cref="VoiceMessageFormat.Json" />.
	/// Message encoding format for WebSocket communication. If <c>null</c>, the API default is used
	/// (currently <see cref="VoiceMessageFormat.Json" />).

	public SourceLanguageMode? SourceLanguageMode { get; set; }
	public SourceLanguageMode? SourceLanguageMode { get; set; } = DeepL.SourceLanguageMode.Auto;

		/// Languages specified here will automatically be added to <see cref="TargetLanguages" /> if not already present.
		/// Maximum 5 target media languages per session.

Conversation

DeeJayTC commented Mar 31, 2026

DeepL Voice API Integration

Overview

Usage

Reconnection

New Files

Enums & Constants

Models (DeepL/Model/)

Options & Interfaces

Core Implementation

Tests

Modified Files

Architecture Decisions

API Constraints

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Models (`DeepL/Model/`)