Education

WebRTC — Web Real-Time Communication

WebRTC (Web Real-Time Communication) is a collection of open standards (W3C + IETF) that enable real-time audio, video, and data communication directly between browsers without plugins, apps, or proprietary software. Standardized in 2021 (W3C), WebRTC is implemented natively in Chrome, Firefox, Safari, and Edge. Every browser-based video conferencing platform — Google Meet, Microsoft Teams (browser), Zoom (browser), Cisco Webex (browser) — uses WebRTC under the hood. For AV integrators, understanding WebRTC's ICE/STUN/TURN infrastructure, codec behavior, and network requirements is essential for troubleshooting conferencing failures in corporate environments and designing reliable browser-based video workflows. For broader streaming protocol context, see signal-types/ip-streaming-protocols.

WebRTC Architecture

Signaling vs. Media

WebRTC separates signaling (how peers find and negotiate with each other) from media transport (how audio/video flows between peers). WebRTC defines the media layer but intentionally leaves signaling to the application — enabling flexibility.

Signaling: The application server (Google's servers for Meet, Microsoft's for Teams) exchanges SDP (Session Description Protocol) offers and answers between peers, and relays ICE candidates. The application uses its own protocol (typically WebSocket or HTTP) for this exchange. AV integrators do not directly control signaling for cloud platforms.

Media transport: After signaling completes, media flows peer-to-peer (ideally) or through a TURN relay (when direct P2P fails). Media transport uses SRTP (Secure RTP) over UDP, with DTLS (Datagram Transport Layer Security) for key exchange. All WebRTC media is encrypted end-to-end by the standard.

ICE — Interactive Connectivity Establishment

ICE is the WebRTC framework for finding a working network path between two peers despite NAT, firewalls, and multiple network interfaces. ICE gathers candidates — potential network paths — and tests them in priority order:

  1. Host candidates: Direct local IP addresses (LAN communication without NAT)
  2. Server-reflexive candidates (srflx): The peer's public IP:port as seen by the STUN server — enables direct P2P across NAT
  3. Relay candidates (relay): Traffic routed through a TURN server — fallback when direct P2P fails

ICE tests all candidate pairs and selects the highest-priority working path. In a typical corporate network, the path is usually server-reflexive (direct UDP through NAT). Behind symmetric NAT or strict firewalls, ICE falls back to TURN relay.

STUN — Session Traversal Utilities for NAT

STUN (RFC 5389) allows a WebRTC endpoint to discover its own public IP address and port as seen from outside its NAT. The endpoint sends a STUN Binding Request to a STUN server; the server reflects the source IP:port back in the response. This "server-reflexive" address is used as an ICE candidate.

Google operates public STUN servers (stun.l.google.com:19302). Enterprises may block these — a WebRTC client that cannot reach a STUN server cannot determine its public IP and may fail to establish direct P2P connections. The fallback is TURN.

TURN — Traversal Using Relays around NAT

TURN (RFC 5766) is a relay server that forwards WebRTC media when direct peer-to-peer connection fails. When ICE cannot find a working direct path (symmetric NAT on both sides, port 443 UDP blocked, strict enterprise firewall), WebRTC falls back to routing media through a TURN server.

TURN adds latency (media travels source → TURN → destination instead of directly) and consumes significant bandwidth and CPU on the TURN server. Enterprise deployments should provide a corporate TURN server to avoid relying on vendor-operated TURN infrastructure.

TURN server implementations: Coturn (open source), Twilio Network Traversal Service, Xirsys, vendor-managed (Microsoft Teams uses its own TURN infrastructure).

Media Layer

Codecs

WebRTC mandates specific baseline codecs, ensuring interoperability between implementations:

Video:

  • VP8: Mandatory for Chrome/Firefox; good quality, open source (Google), widely hardware-accelerated
  • VP9: Optional but widely supported; ~30% better quality than VP8 at same bitrate
  • H.264 Constrained Baseline Profile: Mandatory for Safari/iOS; hardware-accelerated on Apple silicon
  • AV1: Emerging; Chrome 90+, Firefox; best quality-to-bitrate but high encoding CPU usage

Safari requires H.264 for hardware acceleration on older iPhones/iPads. A room system using VP9 with a Safari mobile client may trigger CPU-intensive software decoding on the client device. For maximum compatibility, encoders should negotiate H.264 when the peer indicates Safari/iOS.

Audio:

  • Opus: Mandatory codec; 6–510 kbps, adaptive to network conditions, ultra-low latency (20 ms frames), excellent quality at low bitrates. Used by virtually all WebRTC platforms.
  • G.711 (PCMU/PCMA): Mandatory fallback; PSTN telephone quality, 64 kbps. Negotiated for PSTN gateway interoperability.

Adaptive Bitrate and Simulcast

WebRTC uses GCC (Google Congestion Control) to dynamically adjust video bitrate based on network conditions. Under packet loss or congestion, the sender reduces bitrate; when conditions improve, bitrate increases.

Simulcast: A sender transmits multiple spatial layers (e.g., 1080p, 540p, 180p) simultaneously. An SFU (Selective Forwarding Unit — see below) selects the appropriate layer for each receiver based on their bandwidth. Zoom and Teams use simulcast to serve high-quality video to desktop clients and low-quality to mobile clients in the same call.

Data Channels

WebRTC data channels (SCTP over DTLS over UDP) allow arbitrary binary or text data to flow peer-to-peer alongside media. Uses in AV:

  • Chat and file sharing in conferencing platforms
  • Collaborative annotation in remote support tools
  • Real-time telemetry from room hardware to browser control interfaces

WebRTC in Video Conferencing Infrastructure

Peer-to-Peer vs. SFU vs. MCU

Peer-to-peer (P2P): Direct WebRTC between two endpoints. No server in the media path. Lowest latency, no server cost, but requires each sender to upload a stream for each receiver (scales poorly above 3–4 participants).

SFU (Selective Forwarding Unit): Each participant sends one stream to the SFU; the SFU forwards streams to all other participants without transcoding. Very efficient — low server CPU, supports simulcast layer selection. Used by: Zoom, Microsoft Teams, Google Meet, Jitsi, Mediasoup, LiveKit.

MCU (Multipoint Control Unit): The server decodes all incoming streams and re-encodes a composite image. Every participant receives one stream regardless of participant count. High server CPU cost; adds latency; used in legacy video conferencing bridges and some recording/transcription pipelines.

All major cloud conferencing platforms (Teams, Zoom, Meet) use SFU architecture. The "room" is an SFU cluster, not a P2P mesh.

WebRTC and AV Room Systems

Hardware room systems (Crestron Flex, Logitech Rally, Poly Studio) integrate with WebRTC-based platforms via:

  • Native app mode: The room system runs the platform's native app (Teams Rooms, Zoom Rooms), which uses WebRTC internally
  • USB passthrough (BYOM): The room system presents itself as a USB camera/mic/speaker to a guest laptop running a browser-based conferencing session — the browser's WebRTC session uses the room hardware via USB UAC/UVC
  • Browser-based room control: Some control systems embed a browser (Crestron Touch, Extron TLP) that runs a WebRTC conferencing session directly

Network Requirements for WebRTC

Ports and Protocols

WebRTC media travels over UDP by default, with TCP 443 (TURN over TLS) as fallback:

Traffic TypeProtocolPort
STUN/TURN discoveryUDP3478
TURN media relayUDP3478, or 49152–65535
TURN TLS fallbackTCP443
Direct P2P mediaUDP1024–65535 (ephemeral)
Signaling (WebSocket)TCP443

Blocking outbound UDP entirely forces all WebRTC media to TCP 443 (TURN relay), adding 50–200 ms of additional latency and concentrating load on TURN servers. Allow outbound UDP on ports 1024–65535 for optimal WebRTC performance.

Bandwidth per Participant

QualityVideo BitrateAudioTotal
Low (360p30)200–500 kbps32–64 kbps~500 kbps
Medium (720p30)1–2.5 Mbps32–64 kbps~2.5 Mbps
High (1080p30)2.5–4 Mbps32–64 kbps~4 Mbps
4K (2160p30)8–15 Mbps32–64 kbps~15 Mbps

Budget upstream bandwidth for each sending participant plus downstream bandwidth for each received stream. A 10-person meeting on an SFU: each participant sends 1 stream (e.g., 2.5 Mbps up) and receives 9 streams (varies by platform — most platforms cap received quality to fit bandwidth). Total enterprise bandwidth: ~25–50 Mbps for 10 concurrent full HD participants.

QoS for WebRTC

WebRTC marks packets with DSCP values for QoS prioritization:

  • Audio: DSCP 46 (EF — Expedited Forwarding)
  • Video: DSCP 34 (AF41)
  • Data channels: DSCP 0 (Best Effort)

Enable DSCP remarking on managed switches; strip/remark at the WAN edge if the ISP does not honor DSCP.

Common Pitfalls

  • Symmetric NAT causing ICE failure and TURN fallback. Corporate networks with symmetric NAT (each outbound connection from the same client uses a different external port) prevent server-reflexive ICE candidates from working. WebRTC falls back to TURN relay, increasing latency and consuming TURN server capacity. Fix: configure the corporate firewall for full-cone or port-restricted NAT for the conferencing traffic range; or deploy a corporate TURN server to guarantee reliable fallback without depending on vendor TURN infrastructure.

  • Blocking outbound UDP, forcing TCP 443 TURN for all media. Overly strict firewall policies that block all outbound UDP except DNS and established connections force WebRTC to use TCP TURN relay for all media. This adds 100–300 ms of latency and all conference media flows through the TURN server. Fix: allow outbound UDP on the ephemeral port range (1024–65535) for conferencing platform subnets; allow outbound UDP 3478 for STUN/TURN.

  • Browser H.264 hardware acceleration disabled by enterprise policy. Some enterprise endpoint management (Intune, SCCM) applies Chrome group policies that disable hardware video acceleration. WebRTC then performs all video encode/decode in software, consuming significant CPU and causing video quality degradation on all calls. Fix: check chrome://gpu to verify hardware acceleration is enabled; exclude conferencing applications from software-rendering group policies.

  • WebRTC failing in browser due to missing HTTPS. WebRTC getUserMedia() (camera/mic access) and RTCPeerConnection are only available in secure contexts (HTTPS or localhost). An intranet conferencing portal served over HTTP will fail to access the camera and microphone, appearing as a permission error. Fix: serve all WebRTC applications over HTTPS; obtain a certificate even for internal sites using Let's Encrypt, internal CA, or a reverse proxy.

  • Assuming WebRTC sub-second latency end-to-end in cloud conferencing. Browser WebRTC has 50–200 ms peer-to-peer latency. However, cloud SFU platforms (Teams, Zoom, Meet) route media through their SFU clusters, adding server processing and geographic routing latency. Real-world Teams/Zoom latency is typically 150–400 ms end-to-end, not the theoretical P2P minimum. True sub-second interactive applications (remote NDI production, interactive events) require a direct WebRTC P2P connection or a geographically co-located SFU.

We use optional analytics cookies to understand site usage and improve the experience. You can accept or reject.