Latency and Jitter — AV Network Timing

Latency and Jitter

For Dante-specific latency settings, see networking/dante. For AV-over-IP latency, see control-systems/crestron-dm-nvx.

Latency is the time delay between an audio or video signal entering a system and emerging at the output. Jitter is the variation in that delay from packet to packet or frame to frame. Both matter in AV systems: excessive latency makes video conferencing conversations feel awkward and causes lip-sync errors; excessive jitter causes audio dropouts and video stuttering in networked AV systems. Understanding the sources of latency and jitter helps integrators select appropriate buffer sizes, identify causes of audio glitches, and set realistic expectations with clients.

Network latency in a well-managed LAN is typically < 1 ms between endpoints on the same switch. Latency increases with:

Number of switch hops: each switch adds 10–100 µs of store-and-forward delay
Queue delay: congestion causes packets to wait in switch queues (QoS prevents this for high-priority traffic)
WAN links: internet latency adds 10–200+ ms depending on geography
Wireless: Wi-Fi adds 1–20 ms of variable latency; unsuitable for low-latency audio

Every digital signal processing stage adds latency:

ADC conversion: 0.5–2 ms (analog-to-digital)
DSP processing: 1–5 ms (Q-SYS, Biamp Tesira, Shure IntelliMix)
AEC filter: 0–2 ms additional
DAC conversion: 0.5–2 ms (digital-to-analog)
Amplifier/speaker: < 1 ms

Dante's user-configurable latency (the Dante "unicast latency" setting) determines how large the receive buffer is. A larger buffer tolerates more network jitter but adds latency. Dante defaults:

0.25 ms — gigabit networks with no hops (same switch)
0.5 ms — same site, 1–3 hops
1 ms — recommended minimum for most installations
4 ms — wide-area Dante or networks with variable delay

Total Dante system latency (from mic analog input to speaker analog output): approximately 2–5 ms for a well-designed local LAN, including ADC, network transport, and DAC.

Compressed AV-over-IP systems (NDI, H.264, H.265-based) add encoding and decoding latency:

NDI|HX3: 50–100 ms
H.264 hardware codec: 100–200 ms
SDVoE (uncompressed): < 1 ms
JPEG XS (near-lossless): 1–10 ms

For video conferencing and live production, compression latency must be considered when selecting AV-over-IP technology.

Jitter is the packet-to-packet variation in arrival time. A stream where every packet arrives exactly 1 ms apart has zero jitter. Real networks produce variable delay — some packets arrive early, some late — due to queue depth fluctuations, competing traffic, and switch processing variation.

Audio systems handle jitter with a playout buffer: incoming packets are held briefly before play-out, allowing early packets to wait and late packets to arrive. The buffer depth is the tradeoff between jitter tolerance and added latency:

Dante's 0.25 ms setting can tolerate ~50 µs of jitter
Dante's 1 ms setting can tolerate ~500 µs of jitter
Excessive jitter beyond the buffer depth causes audio dropout

Dante Controller → Network Status shows per-device jitter statistics. Jitter above 100 µs on a Dante network indicates a QoS or network design problem.

Lip sync is the perceptual alignment of audio and video at the display. The human perceptual tolerance for lip sync error is approximately:

Audio leading video: ≤ 25 ms (noticeable above this)
Audio lagging video: ≤ 45 ms (noticeable above this)

Video processing (scalers, display processing, codecs) adds significant latency — often 50–200 ms for consumer displays. The audio system must add equivalent delay to compensate. DSPs provide delay blocks (Q-SYS, Biamp) that can be set to exact millisecond values. HDMI ARC/eARC includes an "audio output delay" signaling mechanism, but in installed AV, manual delay setting in the DSP is more reliable.

Setting Dante latency too low for the network. A 0.25 ms Dante latency setting on a network with 3 switch hops and no QoS produces frequent audio dropouts as network jitter exceeds the buffer. Fix: set Dante latency to 1 ms as a baseline; increase to 2–4 ms for complex or multi-site networks.
Not compensating for video processing delay. A DSP is set to 0 ms audio delay; the display has 80 ms of internal processing latency. Audio arrives 80 ms early — clearly audible as audio leading lips. Fix: measure display latency with a lip-sync test signal; add equivalent audio delay in the DSP.
Assuming Dante latency = total system latency. Dante transport latency is one component; ADC, DSP processing, and DAC each add additional delay. Total analog-to-analog latency is typically 3–10 ms; do not promise < 1 ms system latency just because Dante is set to 0.25 ms.
Using Wi-Fi for Dante or real-time audio. Wi-Fi latency is variable (1–30 ms) and unpredictable under load. Dante over Wi-Fi is unsupported by Audinate. Fix: use wired Ethernet for all Dante devices; use Wi-Fi only for control system touchpanels and management interfaces.

Latency and Jitter — AV Network Timing

Sources of Latency in AV Systems

Network Latency

Processing Latency

Transport Latency (Dante / AES67)

Compression Latency

Jitter

Lip Sync

Common Pitfalls