TL;DR / WebRTC enables browser-to-browser peer-to-peer communication for audio, video, and arbitrary data without plugins, using ICE for NAT traversal and SDP for session negotiation.

How It Works

 ┏━━━━━━━━━━━━┓        ┏━━━━━━━━━━━━┓        ┏━━━━━━━━━━━━┓
 ┃   Peer A   ┃SDP Offer Signaling  SDP Answer   Peer B   ┃
 ┃  Browser   ┃───────→┃   Server   ┃───────→┃  Browser   ┃
 ┗━━━━━━━━━━━━┛        ┗━━━━━━━━━━━━┛        ┗━━━━━━━━━━━━┛

    ←─────────────────────────────────────────────────→
                P2P Media/Data Channel

                     ┏━━━━━━━━━━━━━━━━┓
                     ┃   STUN/TURN    ┃
                     ┗━━━━━━━━━━━━━━━━┛
                       ICE Candidates

Edit diagram

WebRTC (Web Real-Time Communication) establishes peer-to-peer connections between browsers for streaming media and arbitrary data. The architecture involves three distinct phases: signaling, ICE candidate exchange, and the peer connection itself.

Signaling is the only part WebRTC does not standardize -- you bring your own transport (WebSocket, HTTP polling, carrier pigeon). The signaling server relays Session Description Protocol (SDP) messages between peers. The offerer creates an RTCPeerConnection, calls createOffer() to generate an SDP blob describing its media capabilities (codecs, encryption parameters, ICE credentials), sets it as the local description, and sends it to the remote peer via the signaling server. The answerer receives this offer, sets it as the remote description, calls createAnswer(), sets that as its local description, and sends it back. This SDP exchange is the session negotiation handshake.

ICE (Interactive Connectivity Establishment) handles NAT traversal, which is the hard part. Most browsers sit behind NATs that prevent direct incoming connections. ICE gathers candidates -- potential network paths to reach the peer. There are three candidate types: host candidates (local IP addresses), server-reflexive candidates (public IP discovered via a STUN server), and relay candidates (traffic relayed through a TURN server). STUN is lightweight -- it simply tells you your public IP/port mapping. TURN is a bandwidth-consuming relay used as a fallback when direct connectivity is impossible (symmetric NATs, restrictive firewalls).

ICE candidates are gathered asynchronously after setting the local description. Each candidate fires an icecandidate event, which must be sent to the remote peer via the signaling server and added via addIceCandidate(). ICE performs connectivity checks between all candidate pairs, testing reachability with STUN binding requests. The pair with the best priority and connectivity becomes the nominated pair used for the actual media/data flow.

Media streams use getUserMedia() or getDisplayMedia() to capture audio/video, then addTrack() to attach them to the peer connection. WebRTC handles codec negotiation (VP8/VP9/AV1 for video, Opus for audio), encryption (DTLS-SRTP, mandatory -- all WebRTC media is encrypted), bandwidth estimation, and adaptive bitrate automatically. The ontrack event on the receiving side delivers MediaStream objects that can be attached directly to <video> elements.

RTCDataChannel provides an arbitrary data transport between peers. It uses SCTP over DTLS, supporting both reliable (TCP-like) and unreliable (UDP-like) delivery modes. Data channels are created via createDataChannel() on one peer and received via the ondatachannel event on the other. They support binary (ArrayBuffer/Blob) and string messages, with configurable ordering and reliability. This is the foundation for peer-to-peer file sharing, game state synchronization, and low-latency messaging.

Perfect negotiation is the modern pattern for handling renegotiation (adding/removing tracks mid-session). It assigns a polite/impolite role to each peer. When an SDP collision occurs (both peers generate offers simultaneously), the polite peer rolls back its offer and accepts the remote one, while the impolite peer ignores the incoming offer. This eliminates the state machine complexity of manual rollback handling.

WebRTC connections report statistics via getStats(), providing detailed metrics on packets sent/received, jitter, round-trip time, codec in use, and bandwidth estimates. These stats are essential for quality monitoring and adaptive behavior.

The RTCPeerConnection also fires connection state events (connectionstatechange) tracking the lifecycle: new -> connecting -> connected -> disconnected -> failed -> closed. Handling these transitions correctly is crucial for reconnection logic and UI feedback.

Gotchas

  • Signaling is your responsibility -- WebRTC provides no built-in signaling mechanism; you must implement the SDP/ICE exchange transport yourself, and it must handle race conditions during renegotiation
  • TURN servers are expensive but necessary -- approximately 10-15% of real-world connections require TURN relay due to symmetric NATs; budget for TURN bandwidth in production deployments
  • ICE restart is required after network changes -- if a user switches from Wi-Fi to cellular, the existing ICE candidates are invalid; call restartIce() to gather new candidates
  • Data channels have a 256KB send buffer limit -- exceeding it throws; you must implement your own flow control or chunking for large transfers
  • SDP munging is fragile -- manually modifying SDP strings to force codecs or bitrates breaks across browser versions; use RTCRtpTransceiver.setCodecPreferences() instead