This document describes exactly what Raddir encrypts, what the server can and cannot see, and how the end-to-end encryption works. No vague claims — just facts.
Raddir uses two layers of encryption for voice, video, and text:
- Voice: DTLS-SRTP (standard WebRTC transport encryption)
- Signaling: WSS (WebSocket over TLS) when configured with HTTPS
- Protects against network eavesdroppers between client and server
- Voice: AES-256-GCM applied to each Opus frame via Insertable Streams API
- Video & screen share: same AES-256-GCM Insertable Streams pipeline as voice; VP8/VP9 frames encrypted before reaching the SFU
- Text chat: AES-256-GCM applied to message content before sending
- Protects against the server itself — it cannot decrypt content
- Listen to voice audio — Opus frames are encrypted before reaching the SFU
- View video or screen shares — VP8/VP9 frames are encrypted before reaching the SFU
- Read text messages — server stores only ciphertext
- Perform content analysis — no speech recognition, no keyword detection, no image analysis
- Record decryptable audio or video — even if packets are captured, they are AES-256-GCM encrypted
- Recover encryption keys — keys are exchanged directly between clients; the server only relays opaque blobs
- Connection metadata: who is connected, from which IP, at what time
- Channel membership: who is in which channel
- User state: muted/deafened status, nickname, role assignments
- User avatars and server icons: uploaded as plaintext images (intentionally public — see below)
- Server name, description, and configuration: server-side metadata
- Roles and permissions: server-side authorization data
- RTP packet metadata: packet sizes, timing, sequence numbers (needed for SFU routing)
- Signaling messages: join/leave events, transport negotiation parameters
This is the same trust model as Signal: the server handles routing and metadata, but content is cryptographically inaccessible.
The following data is not end-to-end encrypted by design:
| Data | Why it's not E2EE | Risk |
|---|---|---|
| User avatars | Intentionally public — visible to all server members | None (cosmetic) |
| Server icon / name / description | Server-side metadata managed by admins | None (cosmetic) |
| Roles and permissions | Server-side authorization — the server must evaluate permissions to enforce access control | None (authorization metadata) |
| Nicknames | Displayed to all members; server needs them for routing and member lists | None (public identity) |
These are analogous to a Discord server icon or a TeamSpeak server name — they are metadata that the server must know to function. Encrypting them would provide no security benefit since they are shown to all members anyway.
E2EE protects content (voice audio, video, screen share, and text messages). Metadata and cosmetic data are not content.
- When a client joins a channel, it generates an ephemeral ECDH P-256 keypair
- The client announces its ECDH public key to other channel members via the server
- The server relays these announcements as opaque blobs — it does not parse or store them
- The channel's key holder (first member or admin) generates a random AES-256 channel key
- The key holder encrypts the channel key to each member's ECDH public key and sends it via the server
- Each member decrypts the channel key using their ECDH private key
- Client captures microphone audio via
getUserMedia - Browser encodes audio to Opus
- Insertable Streams intercepts the encoded frame before RTP packetization
- Frame is encrypted with AES-256-GCM using the channel key
- 1 byte of the Opus frame header (TOC byte) is left unencrypted so the RTP stack can parse the frame
- A random 12-byte IV is generated per frame (via
crypto.getRandomValues) - Encrypted frame layout:
[1-byte header | 12-byte IV | AES-256-GCM ciphertext + 16-byte auth tag] - Encrypted frame is packetized into RTP and sent via DTLS-SRTP to the SFU
- SFU forwards the encrypted RTP packets to other channel members (it cannot decrypt them)
- Receiving client's Insertable Streams decrypts the frame using the same channel key
- Browser decodes Opus and plays audio
- Client captures webcam or screen via
getUserMedia/ desktop capture API - Browser encodes video to VP8 or VP9
- Simulcast: the browser produces multiple quality layers per producer:
- Webcam: 3 layers — quarter resolution (rid
q, 150 kbps), half resolution (ridh, ~525 kbps), full resolution (ridf, up to 1.5 Mbps) - Screen share: 2 layers — half resolution (rid
q, ~625 kbps), full resolution (ridf, up to 2.5 Mbps)
- Webcam: 3 layers — quarter resolution (rid
- All simulcast layers go through a single
RTCRtpSender— the Insertable Streams transform encrypts every frame regardless of which layer it belongs to - 10 bytes of the VP8/VP9 frame header (payload descriptor + keyframe indicator) are left unencrypted so the SFU can detect keyframes and select which layer to forward
- A random 12-byte IV is generated per frame
- Encrypted frame layout:
[10-byte header | 12-byte IV | AES-256-GCM ciphertext + 16-byte auth tag] - SFU selects which simulcast layer to forward to each consumer based on the unencrypted header bytes and the consumer's preferred layer setting
- Receiving client's Insertable Streams decrypts the frame using the channel key (with
mediaKind: "video"to match the 10-byte header) - Browser decodes VP8/VP9 and renders video
What the 10 unencrypted header bytes reveal: Only the VP8/VP9 payload descriptor — whether a frame is a keyframe, the picture ID, and temporal layer index. This is structural metadata needed for SFU routing. The actual pixel data is always encrypted.
- Client encrypts message content with AES-256-GCM using the channel key
- Ciphertext + IV + key epoch are sent to the server
- Server stores the ciphertext (it cannot read it) and relays to channel members
- Receiving clients decrypt using their copy of the channel key
- Member leaves: A new channel key is generated and distributed to remaining members. The departing member cannot decrypt future audio.
- Member joins: The existing channel key is shared with the new member. They cannot decrypt audio from before they joined.
- Periodic ratchet: Optional HKDF-based key chain for long-lived channels.
- Each client has a long-lived ECDSA P-256 identity keypair stored in the Electron main process via
safeStorage(never inlocalStorage) - All E2EE control messages (
public-key-announce,encrypted-channel-key,key-ratchet) are mandatorily signed with the sender's identity key - Signatures include channel context (
channelId+serverId) to prevent replay/misroute across channels or servers - Unsigned, context-mismatched, or invalid-signature messages are hard-rejected
- On first contact with a peer on a given server, the peer's identity public key is pinned (Trust On First Use)
- Pinned keys are persisted per server in the Electron app data directory (
userData/identity-pins/<serverId>.json) - On subsequent sessions, if a peer's identity key changes, all E2EE messages from that peer are hard-rejected (possible MITM)
- This prevents a malicious server from substituting identity keys after initial contact
- A safety number (fingerprint) is derived from both parties' identity public keys
- Users can compare safety numbers out-of-band (e.g., in person, via secure channel)
- The UI shows a lock icon and verification status per user
- Audio will not transmit or receive until the E2EE channel key is established
- If the key is not established within 10 seconds, audio setup aborts — no unencrypted fallback
- Late-arriving producers are also rejected if no E2EE key is active
- Video (webcam and screen share) will not produce unless
e2eeActiveis true in the voice store — checked beforegetUserMediais even called - If the E2EE key is null at frame time, the encrypt transform drops the frame (never enqueued) — a second failsafe independent of the gating check
- If encryption fails for any reason, the frame is silently dropped — never sent unencrypted
- On the receiving side, if the key is null or decryption fails, the frame is silently dropped — never rendered
- Frames too short to contain a valid ciphertext (< header + IV length) are also dropped
- The server enforces configurable limits on concurrent webcam and screen share producers per channel
- Defaults: 5 webcams, 1 screen share per channel (configurable 0–50 and 0–10 via admin panel)
- When the limit is reached, the server rejects the
producerequest with aPRODUCER_LIMITerror - The client handles the rejection by stopping the camera/capture track and reverting UI state
- Setting a limit to 0 effectively disables that media type server-wide
- The key holder is elected deterministically by
min(SHA-256(identityPublicKey))across channel members - This is not gameable by the server (it cannot influence identity key hashes)
| Purpose | Algorithm | Standard |
|---|---|---|
| Frame encryption | AES-256-GCM | NIST SP 800-38D |
| Key exchange | ECDH P-256 | NIST FIPS 186-4 |
| Key derivation | HKDF-SHA-256 | RFC 5869 |
| Identity keys | ECDSA P-256 + SHA-256 | NIST FIPS 186-4 |
| Identity key storage | Electron safeStorage (OS keychain) | Platform-specific |
| TOFU pinning | Per-server persistent pin store | SSH-style TOFU |
| All crypto | Web Crypto API + Node.js crypto | W3C / OpenSSL |
No third-party crypto libraries are used. All operations run in the browser's native Web Crypto API, which leverages hardware acceleration (AES-NI on x86, ARMv8 crypto extensions).
Beyond E2EE, the server implements several hardening measures:
- WebSocket auth: 10 attempts per 60 seconds per IP — prevents brute-force password/credential attacks
- Public invite endpoints: 20 requests per 60 seconds per IP — prevents invite enumeration and DoS
- Rate limiting uses
req.socket.remoteAddressby default. SetRADDIR_TRUST_PROXY=trueonly when behind a reverse proxy that setsX-Forwarded-For
Origins are restricted to Electron (null, file://, app://) and localhost dev servers. All other origins are rejected.
Admin token grants ephemeral privileges for the WebSocket session only — not persisted as a database role. When the session ends, admin access is gone. This limits the impact of a leaked token.
When an admin modifies a role's permissions, deletes a role, or changes a channel permission override, the server immediately recomputes effective permissions for every connected client who holds that role and pushes a permissions-updated message. This ensures:
- Permission changes take effect instantly without requiring clients to reconnect
- The client UI updates in real time (e.g., video buttons become disabled/enabled)
- Admin token holders still receive all-allow permissions regardless of role changes
The same mechanism fires when a role is assigned or unassigned from a user via the admin panel.
The server enforces permissions at the produce handler level — not just in the client UI:
| Media type | Required permission | Enforcement |
|---|---|---|
| Microphone | speak |
Server rejects produce with NO_PERMISSION if denied |
| Webcam | video |
Server rejects produce with NO_PERMISSION if denied |
| Screen share | screenShare |
Server rejects produce with NO_PERMISSION if denied |
| Move user | moveUsers |
Server rejects move-user with NO_PERMISSION if denied |
| Kick | kick |
Server rejects kick with NO_PERMISSION if denied |
| Ban | ban |
Server rejects ban with NO_PERMISSION if denied |
| Role management | manageRoles |
Server rejects assign-role with NO_PERMISSION if denied |
Client-side UI gating (disabled buttons, tooltips) is a UX convenience. The server is the sole authority — a modified client cannot bypass permission checks.
- Invite blob v2 contains the server address as a routing hint only — the server returns its canonical address from the database, never trusting the blob
- No publicKey at redeem time — the
/api/invites/redeemendpoint creates an unbound credential (no identity attached). This means invites work for users who don't yet have a keypair - Identity binding on first WS auth — when the client connects via WebSocket with a credential and publicKey, the server binds the credential to that publicKey. Subsequent connections must present the same publicKey, preventing credential theft
- Stolen public keys are harmless — without the credential secret, knowing someone's public key cannot be used to impersonate them
- Invite use counts are enforced with an atomic SQL UPDATE to prevent race conditions exceeding
maxUses createdBymetadata is set server-side, not accepted from the request body
Targeted E2EE relay messages (key exchange, verification) require target.serverId === sender.serverId. This prevents cross-server spam via the relay mechanism.
- Max message size: 64 KB — oversized messages are rejected and the connection is closed
- Chat relay: server uses the sender's tracked
channelId, ignoring anychannelIdin the message — prevents cross-channel injection
A partial UNIQUE index on users.public_key (where not NULL) prevents duplicate identity rows that could break key exchange or impersonation.
- No telemetry: The server sends no data to any external service
- No content scanning: No automated moderation, no AI analysis
- No global accounts: No central identity server that could be compromised
- No key escrow: Encryption keys exist only on client devices
- No backdoors: The E2EE design makes server-side content access architecturally impossible
| Threat | Mitigated? | How |
|---|---|---|
| Network eavesdropper | ✅ | DTLS-SRTP + WSS |
| Compromised server (passive) | ✅ | E2EE — server cannot decrypt content |
| Compromised server (active MITM, after first contact) | ✅ | TOFU pinning — identity key substitution is detected and rejected |
| Compromised server (active MITM, first contact) | See "Known Limitations" below | |
| Man-in-the-middle (after verification) | ✅ | Identity verification via safety numbers |
| Compromised client device | ❌ | Out of scope — if your device is compromised, all bets are off |
| Traffic analysis | Server sees packet timing/size; mitigating this requires onion routing (out of scope) | |
| Key compromise (past) | ✅ | Forward secrecy via key ratcheting on member leave |
| Key compromise (future) | ✅ | New keys on member join; periodic ratchet |
| E2EE not engaging silently | ✅ | Voice TX/RX blocked until E2EE key established; no unencrypted fallback |
Identity pinning uses Trust On First Use (TOFU), the same model as SSH. On first contact with a peer, whatever identity key is seen first gets pinned. A malicious signaling server (or active attacker controlling signaling) can still MITM the very first time two peers meet by presenting attacker keys first, which then get pinned.
This is inherent to TOFU and cannot be fixed without adding a trust anchor. Possible future mitigations:
- Out-of-band verification (safety number / QR scan) — users confirm each other's identity fingerprints before trusting the pin. This is the Signal/WhatsApp approach.
- Server-signed identity directory — trust the server as a CA that vouches for identity keys. Weaker (trusts server) but easier UX.
- Pre-shared / published identity fingerprints — users exchange fingerprints via a trusted channel before first contact.
Until one of these is implemented, E2EE protects against passive observers and post-first-contact active attackers, but not against an active MITM during the very first key exchange between two peers.
WebSocket message size is capped at 64 KB, but an authenticated client could flood messages at high frequency.
Fixed: Per-connection post-auth rate limiting is now enforced in the signaling handler. Messages are categorized and limited per second:
| Category | Message types | Limit |
|---|---|---|
| chat | chat-message |
5/sec |
| e2ee | e2ee |
10/sec |
| speaking | speaking |
20/sec |
| media | create-transport, connect-transport, produce, consume, resume-consumer, set-preferred-layers |
5/sec |
| general | everything else | 30/sec |
Exceeding the limit returns a RATE_LIMITED error and drops the message. Counters are per-WebSocket connection and garbage-collected on disconnect.
TOFU pins are stored as (serverId, userId) → identityPublicKey. The userId is assigned by the server. If the server can remap user identities (malicious or buggy) or if userId values aren't stable across sessions, pinning can be undermined — the server could assign a victim's userId to an attacker.
Possible fixes:
- Pin by a stable identifier that's not server-mutable (e.g., the peer's long-term identity key fingerprint after first contact)
- Or cryptographically bind
userIdto the identity public key at registration time - At minimum, explicitly state in the threat model: "server assigns stable
userIdper identity/publicKey" (currently enforced by the partial UNIQUE index onusers.public_key)
The current safety number is 12 digits derived from 5 bytes (~40 bits of entropy). This is adequate for casual visual verification but significantly weaker than Signal-style safety numbers (~220 bits).
Needed fix: Derive a longer number with more entropy (e.g., 80–128 bits worth of digits or words).
useAudio.ts used u.id instead of u.userId, breaking key-holder election.
Fixed: Changed to u.userId to match the server's SessionInfo shape. Key-holder election now works correctly with the actual member list.
Electron's certificate-error handler allows TLS bypass for a host set via the trust-server-host IPC call. This is intentional for self-hosted servers with self-signed certs, but it's an attack surface if:
- The renderer can be tricked into trusting a hostile host (phishing)
- The app ever loads arbitrary remote content
Hardening recommendations:
- Never load arbitrary remote content in the renderer (only the bundled app)
- Only invoke
trust-server-hostas part of an explicit "trust this server" UI flow with user confirmation - Ideally, store and pin the certificate fingerprint, not just the hostname
| Feature | Raddir | TeamSpeak | Discord |
|---|---|---|---|
| Transport encryption | ✅ DTLS-SRTP | ✅ | ✅ |
| E2E voice encryption | ✅ AES-256-GCM | ❌ | ❌ |
| E2E video encryption | ✅ AES-256-GCM | ❌ | ❌ |
| E2E text encryption | ✅ AES-256-GCM | ❌ | ❌ |
| Server can hear audio | ❌ No | ✅ Yes | ✅ Yes |
| Server can see video | ❌ No | ✅ Yes | ✅ Yes |
| Simulcast (adaptive quality) | ✅ 3-layer | ❌ | ✅ |
| Telemetry | ❌ None | ✅ Extensive | |
| Self-hostable | ✅ | ✅ | ❌ |
| Open source | ✅ | ❌ | ❌ |