Education

DSP Fundamentals — Digital Signal Processing for AV

Digital signal processing is the mathematical manipulation of audio signals represented as sequences of numbers. Understanding the building blocks — conversion, filtering, dynamics, delay, and mixing — explains both how to configure DSP systems and how to diagnose problems when they fail.

For an overview of DSP hardware platforms used in AV, see dsp.

Analog-to-Digital and Digital-to-Analog Conversion

A microphone or line-level source produces an analog voltage that continuously varies with sound pressure. Before digital processing can occur, this must be converted to numbers.

Analog-to-Digital Converter (ADC) — Samples the analog voltage at regular intervals (the sample rate) and assigns each sample a numeric value (quantization). Two parameters define conversion quality:

Sample rate — How many samples per second. The Nyquist theorem states the sample rate must be at least twice the highest frequency to be accurately represented. CD audio uses 44.1 kHz (capturing up to 22.05 kHz). Professional audio uses 48 kHz as the universal standard (capturing up to 24 kHz, well above the 20 kHz human hearing limit). Some recording applications use 96 kHz or 192 kHz, though these offer minimal practical benefit in AV systems and increase processing demands significantly.

Bit depth — How many possible amplitude values exist for each sample. 16-bit audio has 65,536 levels; 24-bit has 16,777,216 levels. Each bit adds approximately 6 dB of dynamic range. 16-bit provides ~96 dB dynamic range; 24-bit provides ~144 dB. Professional DSP systems operate internally at 32-bit or 64-bit floating point for headroom during processing, converting to 24-bit for I/O.

Digital-to-Analog Converter (DAC) — Reconstructs an analog waveform from the digital sample stream. Output feeds amplifiers, headphone outputs, or recording interfaces.

Filters

Filters are the core tool for shaping frequency content. All EQ, crossover, and correction processing in professional DSP uses filter types:

High-Pass Filter (HPF) — Passes frequencies above a cutoff frequency; attenuates below. Used to remove low-frequency rumble from microphones (typically 80-120 Hz HPF on speech mics), and to protect subwoofers from full-range signals.

Low-Pass Filter (LPF) — Passes frequencies below cutoff; attenuates above. Used in subwoofer crossovers and to remove high-frequency hiss.

Parametric EQ — The universal tool for frequency correction. Each band is defined by three parameters: center frequency, gain (+/- dB), and Q (bandwidth — higher Q = narrower band). A parametric band at 200 Hz, -6 dB, Q=1.4 cuts a broad low-midrange buildup. A band at 1 kHz, -3 dB, Q=8 makes a narrow notch cut for a specific room resonance.

Graphic EQ — Fixed-frequency bands at octave or 1/3-octave intervals, each with adjustable gain. Less precise than parametric; useful for visual display of the EQ curve. Common on installed speaker systems for field tuning.

High-Shelf / Low-Shelf — Boosts or cuts all frequencies above (shelf) or below (shelf) a turnover frequency. Less precise than parametric; useful for broad tonal correction.

All-Pass Filter — Changes phase without changing amplitude. Used for phase alignment between drivers in a speaker system, or to time-align components in a crossover.

Filter slope — Measured in dB/octave. A 6 dB/octave filter (first-order) gently rolls off frequencies. A 24 dB/octave filter (fourth-order Linkwitz-Riley) is common for crossovers requiring steep separation between drivers.

Dynamics Processing

Dynamics processing controls the relationship between input and output level over time.

Compressor — Reduces gain when the signal exceeds a threshold. Defined by: threshold (dB level where compression begins), ratio (how much compression is applied — 4:1 means output increases 1 dB for every 4 dB above threshold), attack time (how quickly compression engages), release time (how quickly compression stops after signal falls below threshold), and knee (hard knee = abrupt compression onset; soft knee = gradual). Compressors tame dynamic range in music and prevent occasional loud sounds from overwhelming the system.

Limiter — A compressor with a very high ratio (20:1 or higher, often described as "infinity:1"). A limiter acts as a ceiling — it prevents the output from exceeding a set level regardless of input. Used to protect speakers from damage and amplifiers from clipping. Every professional DSP output should have a limiter set below the power amplifier's clip point.

Noise Gate — Attenuates or mutes the signal when it falls below a threshold. Eliminates microphone hiss and room noise during silence. Used in conferencing systems to suppress ambient noise on open microphone channels. Attack time must be fast (< 5 ms) to avoid cutting off consonants; release time (50-300 ms) determines how quickly the gate closes.

Automatic Gain Control (AGC) — Continuously adjusts gain to maintain a consistent output level regardless of input level. Used in conferencing systems to compensate for talkers who vary their distance from the microphone. AGC has a slow response (seconds) — slower than compression — to avoid "pumping" on normal speech dynamics.

De-esser — A frequency-selective compressor that reduces sibilant ("s", "sh", "t") sounds. Applied to vocal microphone channels in broadcast and conferencing where harsh sibilance is distracting.

Delay

Digital delay stores samples in a buffer and outputs them after a specified time. Delay in AV DSP serves two purposes:

Time alignment — In distributed speaker systems, speakers at different distances from listeners create "double voice" when sounds arrive at slightly different times. The speed of sound is approximately 1,125 feet per second (343 m/s). Calculate needed delay: feet of separation between speaker positions / 1.125 = milliseconds of delay needed at the nearer speaker. A front speaker 40 feet from a delay speaker needs 35.5 ms added to the delay speaker output.

Latency compensation — When mixing multiple sources with different processing paths (e.g., an NDI source with 5ms codec latency alongside an analog microphone), delay can synchronize them in time.

Acoustic Echo Cancellation (AEC)

AEC removes the loudspeaker signal from the microphone to prevent remote conferencing participants from hearing their own voice echo back. See aec for detailed explanation. In the DSP signal flow:

  1. The loudspeaker output (post-amplifier, pre-room) must be routed to the AEC reference input
  2. The AEC processes the microphone signal, using the reference to cancel the echo component
  3. Residual echo is further reduced by Non-Linear Processing (NLP)

Critical rule: The AEC reference must be the actual signal sent to the loudspeaker — not the conferencing send, not the far-end audio, not the mixed bus. Any difference between the reference and what actually plays through the speaker degrades AEC performance.

Matrix Mixing and Automatic Mic Mixing

Matrix mixer — Routes any input to any output at any gain level. In a 16×16 matrix, each of the 16 inputs can be sent to each of the 16 outputs at a different gain (or muted). Essential for multi-zone AV systems, combining sources for recording, and routing audio to conferencing codecs.

Automatic Microphone Mixer (AMM) — In multi-microphone conferencing systems, AMM prevents the "conference noise multiplier" effect. When multiple open microphone channels are summed, the noise floor increases: each doubling of open mics adds approximately 3 dB of noise. AMM algorithms (gate-based or gain-share-based) ensure only active talker microphones are open:

  • Gate-based AMM — Opens a mic channel when signal exceeds a threshold; closes it when signal falls below. Faster, but can miss soft talkers or chatter on quiet channels.
  • Gain-sharing AMM — Divides total gain among active channels proportionally based on level. Smoother; handles multiple simultaneous talkers more naturally. Used in Biamp Tesira, QSC Q-SYS, and Shure IntelliMix.

Professional DSP Platform Overview

PlatformManufacturerKey Strengths
Q-SYS Core seriesQSCDante-native; control + DSP + conferencing in one box; Lua scripting
Tesira / TesiraFORTEBiampConferencing-optimized; excellent AEC; scalable Dante network audio
IntelliMix P300ShureCompact; deep integration with MXA beamforming mics; UCC
Converge Pro 2ClearOneStrong AEC; BMA360 beamforming mic integration
DMP 128 PlusExtronExtron ecosystem integration; simple conference rooms
Radius NXSymetrixCost-effective; installed sound and conferencing
Crown DCi-DACrown/HarmanDSP-enabled amplifier; combines DSP and amplification
BSS Soundweb LondonBSS/HARMANFlexible; HiQnet control integration with HARMAN ecosystem

DSP Configuration Workflow

  1. Define signal flow on paper — List every physical input, every output, and every processing stage between them before opening software.
  2. Assign I/O — Map physical ports (mic preamp 1, analog output 3, Dante channel 5) to named logical channels in the DSP software.
  3. Build processing chain — Add blocks: gate → EQ → AEC (for microphones); crossover → EQ → limiter (for speakers).
  4. Route the AEC reference — Explicitly wire the loudspeaker output to the AEC reference input. Verify the reference is post-mixing, pre-amplification.
  5. Commission gain structure — Set input gains so nominal microphone level reads -18 dBFS at the ADC; set output levels to drive amplifiers to nominal power at 0 dB on the fader.
  6. Measure and tune — Use a calibrated measurement microphone and real-time analyzer (Rational Acoustics Smaart, QSC Q-SYS AV Bridging) to set parametric EQ for flat in-room response.
  7. Verify AEC in conference mode — With a remote caller active, verify the near-end microphone does not transmit the loudspeaker audio back to the far end. Adjust NLP aggressiveness as needed.

Common Pitfalls

  • Double AEC — Running AEC in the hardware DSP and the Teams/Zoom software client simultaneously causes severe audio clipping. Disable software AEC when hardware DSP AEC is active. See aec.
  • Insufficient bit depth on summing bus — When mixing 16+ channels, level headroom on the mixing bus must account for simultaneous peaks. Use DSP platforms that operate internally at 32-bit float.
  • EQ without measurement — Tuning parametric EQ by ear alone leaves room modes uncorrected. Use a calibrated microphone and real-time analyzer (RTA) for all speaker tuning.
  • Gate-based AMM in reverberant rooms — Gate-based AMM opens microphones based on level threshold. In reverberant rooms, room reflections trigger gate openings incorrectly. Use gain-sharing AMM in rooms with RT60 above 0.5 seconds.
  • Filter resonance (Q) too high — Parametric boost at high Q creates ringing artifacts audible on transients. Cuts can use high Q safely; boosts should use Q < 4 in most cases.

Related

Continue reading in the knowledge base.

We use optional analytics cookies to understand site usage and improve the experience. You can accept or reject.