Education

Transport Protocols for AV — TCP, UDP, RTP, RTSP

Transport protocols define how data moves between two points on a network: whether delivery is guaranteed, how errors are handled, and how streams are described and controlled. Every AV streaming system sits on top of one or more of these protocols, and understanding them explains why AV systems behave the way they do under network stress.

TCP — Transmission Control Protocol

TCP is a reliable, connection-oriented protocol. Before data flows, TCP performs a three-way handshake to establish a connection. During transmission, every packet is acknowledged, and lost packets are retransmitted automatically.

TCP properties:

  • Guaranteed delivery — every byte arrives, in order
  • Flow control — sender slows down if receiver is overwhelmed
  • Congestion control — backs off when the network is congested
  • Higher overhead due to acknowledgments and retransmissions
  • Variable latency — retransmission adds unpredictable delay

AV uses for TCP:

  • Control commands (Crestron, AMX, QSC Q-SYS APIs, Extron SIS)
  • HTTP-based APIs and web GUIs for AV devices
  • File transfers (firmware updates, project uploads to DSPs)
  • RTSP signaling (connection setup, not the actual media stream)
  • SIP signaling for conferencing systems

TCP is appropriate whenever you need guaranteed delivery and can tolerate variable latency. Control commands must arrive — you can't afford a "set volume to 50%" command to silently disappear. But TCP's retransmission behavior makes it unsuitable for real-time audio/video transport.

UDP — User Datagram Protocol

UDP is a connectionless, unreliable protocol. It sends packets without establishing a connection, without acknowledgments, and without retransmission. If a packet is lost, it's gone.

UDP properties:

  • No delivery guarantee — packets may be lost, reordered, or duplicated
  • No connection setup — low overhead
  • Consistent, predictable latency — no retransmission delay
  • Multicast-capable — one sender, many receivers on the same packet
  • Minimal CPU overhead

AV uses for UDP:

  • Dante audio transport
  • AES67 audio transport (RTP over UDP)
  • NDI video streaming
  • RTP/RTCP media streams
  • AV-over-IP video (SDVoE, JPEG XS)
  • SMPTE ST 2110 media transport
  • Multicast audio and video distribution

UDP's lack of retransmission is actually a feature for real-time AV. A retransmitted audio packet arriving 50 ms late is useless — the audio has already played past that point. Better to accept the glitch and move on than to delay everything waiting for a retransmit. This is why nearly all professional real-time AV transport protocols run on UDP.

The tradeoff: since UDP doesn't guarantee delivery, the network infrastructure must be reliable. Switch quality, QoS configuration, and proper IGMP snooping aren't optional on UDP-based AV networks — they're what makes the "unreliable" protocol work reliably in practice.

RTP — Real-Time Transport Protocol

RTP (Real-Time Transport Protocol, RFC 3550) is an application-layer protocol that runs over UDP and adds structure specifically for real-time media streaming. It doesn't guarantee delivery — it adds timing, sequencing, and payload identification on top of UDP's raw transport.

What RTP adds over raw UDP:

  • Sequence numbers — lets receivers detect lost packets and identify gaps
  • Timestamps — precise media timing for synchronization and playout buffering
  • Payload type — identifies the media format (PCM audio, H.264 video, etc.)
  • SSRC — identifies the stream source, allowing multiple streams to be demultiplexed
  • CSRC — identifies contributing sources when streams are mixed

RTP is the transport backbone for AES67, SMPTE ST 2110, SIP-based conferencing media, and many IP camera streams. When Dante uses AES67 mode, its audio flows are RTP.

RTCP — RTP Control Protocol

RTCP runs alongside every RTP session and provides out-of-band statistics: packet loss percentage, jitter measurements, round-trip delay, and receiver reports. AV systems use RTCP data to:

  • Monitor stream quality in real time
  • Detect network degradation before it becomes audible
  • Synchronize multiple RTP streams (audio + video lip sync)

RTCP doesn't carry media — it carries the metadata about how the RTP media stream is performing.

RTSP — Real-Time Streaming Protocol

RTSP (Real-Time Streaming Protocol, RFC 2326) is a signaling protocol that sets up, controls, and tears down RTP media sessions. Think of RTSP as the "remote control" for an RTP stream — it handles the negotiation, but the actual media flows via RTP/UDP.

RTSP commands:

  • DESCRIBE — ask the server what streams are available and their parameters (returns an SDP description)
  • SETUP — negotiate transport parameters (ports, multicast/unicast)
  • PLAY — start the media stream
  • PAUSE — pause the stream
  • TEARDOWN — end the session

AV uses for RTSP:

  • IP camera streams (ONVIF cameras use RTSP for H.264/H.265 video)
  • Video surveillance and PTZ camera feeds in AV systems
  • Legacy IP streaming infrastructure
  • Video conferencing gateway protocols

Most modern IP cameras expose an RTSP URL in the format rtsp://[camera-ip]/stream. AV systems, NVRs, and video management platforms connect to this URL to receive the H.264 or H.265 stream via RTP.

Protocol Stack Summary

A typical AV stream flows through multiple protocol layers simultaneously:

Application ProtocolTransportNetworkNotes
Dante audioUDPIPProprietary framing over UDP
AES67 audioRTP/UDPIPStandard RTP payload
NDI videoUDPIPProprietary; also supports TCP fallback
SMPTE ST 2110RTP/UDPIPProfessional broadcast standard
IP camera (RTSP)RTP/UDPIPRTSP setup + RTP media
Control API (HTTP)TCPIPREST or WebSocket
SIP signalingTCP/UDPIPSIP over TCP for reliable delivery
SIP mediaRTP/UDPIPAudio/video after SIP negotiation

Common Pitfalls

  • Expecting UDP streams to self-heal — UDP has no retransmission. Network packet loss translates directly to audio dropouts or video artifacts. Proper switch QoS, IGMP snooping, and physical layer reliability are the only mitigations. Troubleshoot the network, not the protocol.
  • Blocking UDP in firewalls between AV VLANs — Firewall rules that allow TCP but block UDP will prevent Dante, AES67, and RTP video from flowing between subnets. Always verify that inter-VLAN firewall rules explicitly permit UDP for AV traffic on the required port ranges.
  • Confusing RTSP (signaling) with RTP (media) — An RTSP URL connection being established doesn't mean media is flowing. If RTSP connects but no video appears, the issue may be the RTP ports being blocked by a firewall or NAT.
  • Not accounting for RTP jitter buffers — RTP receivers use jitter buffers to smooth out network timing variations. Too small a buffer causes dropouts from momentary network jitter; too large a buffer adds latency. For conferencing systems, tune the jitter buffer to balance latency and stability.
  • Using TCP for real-time audio/video — Some integrators configure streaming systems to use TCP "for reliability." TCP's retransmission causes bursty latency spikes that are far more disruptive to real-time AV than the occasional dropped UDP packet.

Related

Continue reading in the knowledge base.

We use optional analytics cookies to understand site usage and improve the experience. You can accept or reject.