32blogby Studio Mitsu

fqmpeg Creative Audio Effects: Reverb, Echo, Chorus & More (9 Verbs)

Nine fqmpeg verbs for creative audio: reverb, echo, chorus, phaser, flanger, tremolo, vibrato, karaoke, stereo widen — source-verified defaults, dry-run output, and a frank look at what's hardcoded and why.

by omitsu17 min read
On this page

fqmpeg's C10 cluster is the effects-pedal box of the toolkit — nine verbs that color audio rather than fix it. Two are time-based echoes (reverb, echo-effect). Five are LFO modulation effects (chorus, phaser, flanger, tremolo, vibrato). Two are stereo-field tricks (audio-karaoke, audio-stereo-widen).

Compared with the C9 dynamics/EQ verbs, C10 is small and the implementations are short — but every one of them wraps a multi-parameter FFmpeg filter behind a 2-or-3-option surface. This guide walks each verb against src/commands/ of fqmpeg 3.0.3 and is honest about what's hardcoded and why. (Some hardcodings are sensible — they hide DSP coefficients that would only invite footguns. Others are limitations that you should know about before you ship a render.)

What you'll get out of this guide

  • A decision matrix for the 9 verbs by sonic effect (time-based / modulation / stereo)
  • Exact FFmpeg invocation each verb generates (verified --dry-run output)
  • Defaults, units, output filenames — and the filter coefficients fqmpeg fixes for you
  • Three recipes — vocal warm-up, lo-fi guitar, AM-radio dialogue — and the escape hatches when the simplified surface isn't enough

The 9 Verbs at a Glance

All nine verbs preserve video with -c:v copy — drop a video file in and you get the same video with the processed audio.

GroupVerbsWhat they do
Time-basedreverb, echo-effectDelay/reflection simulation via aecho
Modulationchorus, phaser, flanger, tremolo, vibratoLFO-driven pitch / time / amplitude modulation
Stereoaudio-karaoke, audio-stereo-widenCenter-channel removal, Haas-style widening

Five things to know before reading on:

  1. reverb is not a true reverb. It's a single-tap aecho filter with hardcoded in_gain=0.8 and out_gain=0.88. Real reverb (impulse-response convolution or a Schroeder network) needs afir or the freeverb filter — for that, drop to raw FFmpeg. reverb here is "a touch of room" depth, not a cathedral.
  2. chorus ships a 3-voice preset that is not configurable. The voices have hardcoded delays 50|60|70 ms and decays 0.4|0.32|0.28. Only the modulation depth and speed are exposed. The rationale: well-tuned chorus needs intuition about voice spacing, and exposing all 6 parameters in a CLI would invite settings that sound broken. If you need a custom multi-voice arrangement, run the FFmpeg chorus filter directly (it accepts up to 32 voices).
  3. flanger --mix is internally remapped to width. FFmpeg's flanger filter uses width=0-100 for wet/dry blend, not mix=0-1. fqmpeg accepts the more conventional --mix 0.0-1.0 and multiplies by 100 before passing it in. So --mix 0.7 becomes width=70. (This was a B7 bugfix — earlier fqmpeg released a broken mix= form that the filter silently ignored.)
  4. tremolo and vibrato have identical option surfaces but completely different effects. Tremolo modulates volume (an LFO multiplies amplitude). Vibrato modulates pitch (an LFO shifts frequency). Same --freq / --depth flags, same defaults (5 Hz, 0.5), same range — but you cannot substitute one for the other.
  5. audio-karaoke only works on dead-center, dry vocals. The filter is the classic pan=stereo|c0=c0-c1|c1=c1-c0 phase-cancellation trick. It assumes the vocal is panned identically into both channels with no stereo widening, reverb, or chorus on the vocal bus. Modern pop mixes break all three of those assumptions. Expect residual artifacts.

Time-Based: Reverb & Echo

Both verbs use the same FFmpeg filter (aecho) — the difference is configuration. reverb is one short tap (typically 40 ms) used as ambience. echo-effect is a chain of decaying repeats used as a distinct musical effect.

reverb — Add reverb-like ambience to audio

A single-tap echo masquerading as reverb. Good for adding a touch of space, not for emulating a hall.

OptionDefaultNotes
--delay <ms>40Delay between dry and wet tap
--decay <n>0.5Decay factor (0.0-1.0)
-o, --output <path><input-stem>-reverb.<ext>
bash
$ npx fqmpeg reverb input.mp4 --dry-run

  ffmpeg -i input.mp4 -af aecho=0.8:0.88:40:0.5 -c:v copy input-reverb.mp4

What's hardcoded and why: in_gain=0.8 and out_gain=0.88 are fixed in the aecho filter string. These are dry/wet attenuation — they don't change the character of the echo, only its loudness relative to the source. fqmpeg's choice is a safe mid-blend that doesn't clip on typical inputs.

When you outgrow it: real reverb is multiple decorrelated delay lines (a Schroeder network) or impulse-response convolution. For a credible hall/plate sound, switch to freeverb (if your FFmpeg build has it) or afir with an impulse-response WAV:

bash
ffmpeg -i input.mp4 -i hall_ir.wav -filter_complex "[0:a][1:a]afir=dry=10:wet=10[a]" \
  -map 0:v -map "[a]" -c:v copy hall-reverb.mp4

echo-effect — Add echo / delay with multiple taps

Generates a geometric chain of echoes: delay, 2×delay, 3×delay, ..., each one quieter than the last by a factor of decay^i.

OptionDefaultNotes
--delay <ms>500Base delay; subsequent taps are 2×, 3×, ...
--decay <n>0.3Decay factor; subsequent taps decay geometrically
--repeats <n>3Number of echo taps
-o, --output <path><input-stem>-echo.<ext>
bash
$ npx fqmpeg echo-effect input.mp4 --dry-run

  ffmpeg -i input.mp4 -af aecho=0.8:0.88:500|1000|1500:0.300|0.090|0.027 -c:v copy input-echo.mp4

What's hardcoded and why: like reverb, the in_gain/out_gain are fixed at 0.8:0.88. The geometric decay (decay^i for tap i) is not "hardcoded" in a footgun sense — it's the natural physical model for a single reflective surface losing energy on each bounce. Exposing per-tap delays/decays would let you build pathological combs, so fqmpeg ties them together.

When you outgrow it: if you want irregular tap spacing (a slap-back into a long tail, or stereo ping-pong), drop straight to aecho with |-separated lists, or use adelay for an exact-millisecond stereo offset:

bash
ffmpeg -i input.mp3 -af "aecho=0.8:0.9:60|300|800:0.5|0.3|0.15" out.mp3

Modulation Effects

All five modulation verbs are driven by a low-frequency oscillator (LFO) that varies some property of the signal over time. The shared mental model: pick a speed (how fast the LFO cycles, in Hz) and a depth (how much it varies).

chorus — Add a chorus / thickening effect

Layers 3 slightly detuned, slightly delayed copies on top of the dry signal. Sounds like multiple performers playing the same line.

  • Source: src/commands/chorus.js
  • Filter: chorus=0.5:0.9:50|60|70:0.4|0.32|0.28:<depth>|<depth>|<depth>:<speed>|<speed>|<speed>
  • Output: <input-stem>-chorus.<ext>
OptionDefaultNotes
--depth <ms>2Modulation depth (sweep range, applied to all 3 voices)
--speed <Hz>0.5Modulation speed (applied to all 3 voices)
-o, --output <path><input-stem>-chorus.<ext>
bash
$ npx fqmpeg chorus input.mp4 --dry-run

  ffmpeg -i input.mp4 -af chorus=0.5:0.9:50|60|70:0.4|0.32|0.28:2|2|2:0.5|0.5|0.5 -c:v copy input-chorus.mp4

What's hardcoded and why: quite a lot. The 3-voice configuration is fixed: per-voice delays 50|60|70 ms, per-voice decays 0.4|0.32|0.28, in/out gain 0.5:0.9. Only --depth and --speed are surfaced, and both are applied uniformly to all three voices.

This is a deliberate "preset" choice. Well-tuned chorus depends on the spread between voices — if all three have the same delay, you get a single thicker echo, not chorus. If the delays are too close (e.g. 50|51|52), it sounds like a comb filter. The fqmpeg preset (50|60|70 ms with descending decays) is a "warm pop chorus" that works on vocals, electric guitar, and synth pads. It will not give you ethereal pads with wide stereo spread — that needs different voice spacing and different per-voice modulation rates.

When you outgrow it: invoke the chorus filter directly. It accepts arbitrary voice counts via |-separated lists:

bash
ffmpeg -i input.wav -af "chorus=0.6:0.9:30|45|60|80:0.3|0.25|0.2|0.15:1.5|2|2.5|3:0.3|0.4|0.5|0.6" wide-chorus.wav

phaser — Apply a sweeping phaser effect

Combines the signal with a phase-shifted copy of itself, producing the classic "whoosh" sweep.

  • Source: src/commands/phaser.js
  • Filter: aphaser=speed=<speed>:decay=<decay>
  • Output: <input-stem>-phaser.<ext>
OptionDefaultNotes
--speed <Hz>0.5LFO speed
--decay <n>0.4Decay factor (0.0-1.0) controls feedback intensity
-o, --output <path><input-stem>-phaser.<ext>
bash
$ npx fqmpeg phaser input.mp4 --dry-run

  ffmpeg -i input.mp4 -af aphaser=speed=0.5:decay=0.4 -c:v copy input-phaser.mp4

The FFmpeg aphaser filter has additional parameters (in_gain, out_gain, delay, type for sinusoidal vs triangular LFO) that fqmpeg leaves at the filter's own defaults — pass them directly to raw aphaser=... if you want triangular sweep or a different stage count.

flanger — Apply a flanger effect

Like phaser but with a much shorter, modulating delay — gives the metallic "jet engine" sweep familiar from late-70s rock.

  • Source: src/commands/flanger.js
  • Filter: flanger=speed=<speed>:depth=<depth>:width=<mix×100>
  • Output: <input-stem>-flanger.<ext>
OptionDefaultNotes
--speed <Hz>0.5LFO speed
--depth <ms>2Modulation depth
--mix <n>0.7Dry/wet mix (0.0-1.0) — fqmpeg multiplies by 100 internally
-o, --output <path><input-stem>-flanger.<ext>
bash
$ npx fqmpeg flanger input.mp4 --dry-run

  ffmpeg -i input.mp4 -af flanger=speed=0.5:depth=2:width=70 -c:v copy input-flanger.mp4

The --mixwidth mapping: FFmpeg's flanger filter calls its wet/dry parameter width and accepts 0-100. fqmpeg uses the more conventional --mix 0.0-1.0 and silently multiplies by 100. This was a 3.0 bugfix — earlier fqmpeg passed mix=0.7 directly to the filter, which the filter ignored, so the effect was applied at the filter's own default (100% wet — far too much). Run with --dry-run to confirm your --mix 0.7 produces width=70.

tremolo — Apply tremolo (volume oscillation)

Modulates output volume by an LFO. Classic surf-rock guitar amp effect.

OptionDefaultNotes
--freq <Hz>5LFO frequency
--depth <n>0.5Modulation depth (0-1); higher = more pronounced volume swell
-o, --output <path><input-stem>-tremolo.<ext>
bash
$ npx fqmpeg tremolo input.mp4 --dry-run

  ffmpeg -i input.mp4 -af tremolo=f=5:d=0.5 -c:v copy input-tremolo.mp4

vibrato — Apply vibrato (pitch oscillation)

Modulates pitch (not volume) by an LFO. Same option surface as tremolo — be careful not to confuse them.

OptionDefaultNotes
--freq <Hz>5LFO frequency
--depth <n>0.5Modulation depth (0-1); higher = wider pitch swing
-o, --output <path><input-stem>-vibrato.<ext>
bash
$ npx fqmpeg vibrato input.mp4 --dry-run

  ffmpeg -i input.mp4 -af vibrato=f=5:d=0.5 -c:v copy input-vibrato.mp4

Tremolo vs vibrato: identical CLI, opposite effect. If you ran tremolo and the result sounds like the source went seasick (pitch wobbling) instead of swelling (volume rising and falling), you accidentally called vibrato. Quick test: at --depth 1.0 --freq 0.5, tremolo cycles between silent and loud once every 2 seconds; vibrato cycles between low and high pitch.

Stereo Manipulation

audio-karaoke — Remove center-panned vocals

Subtracts the right channel from the left and vice versa, cancelling anything panned identically into both channels. Classic karaoke trick.

  • Source: src/commands/audio-karaoke.js
  • Filter: pan=stereo|c0=c0-c1|c1=c1-c0
  • Options: none — just an input and optional -o
  • Output: <input-stem>-karaoke.<ext>
bash
$ npx fqmpeg audio-karaoke song.mp3 --dry-run

  ffmpeg -i song.mp3 -af pan=stereo|c0=c0-c1|c1=c1-c0 -c:v copy song-karaoke.mp3

The honest limitations:

  1. Works only on dead-center, dry vocals. If the vocal has reverb, doubler, chorus, or any stereo widening on its own bus, those wet components survive the subtraction.
  2. Kills anything center-panned, including kick drum, bass, and snare. Most pop mixes pan all four of those center, so you lose the rhythm section along with the vocal.
  3. Modern streaming masters are heavily processed, and the "center channel" assumption breaks down — you'll typically hear residual vocal at -10 to -15 dB rather than full removal.

For credible vocal isolation/removal on modern tracks, the only reliable approach is ML-based source separation (Spleeter, Demucs) — that's outside FFmpeg's scope.

audio-stereo-widen — Widen the stereo image

Adds a Haas-style short delay between channels to push perceived width out past the speakers.

OptionDefaultNotes
--delay <ms>20Inter-channel delay; higher = wider but more phasey
-o, --output <path><input-stem>-wide.<ext>
bash
$ npx fqmpeg audio-stereo-widen input.mp4 --dry-run

  ffmpeg -i input.mp4 -af stereowiden=delay=20 -c:v copy input-wide.mp4

Mono summation warning: the Haas trick relies on small inter-channel delays, which means if your output is summed to mono (radio broadcast, phone speaker, Bluetooth headset on mono mode), the delay becomes a comb filter and the audio sounds thin and hollow. Check mono compatibility — use a downmix preview: ffmpeg -i input-wide.mp4 -ac 1 -t 10 -f null - and listen with -filter_complex amerge. If the widening is for an online video and you don't care about mono playback, ignore this.

Real-World Recipes

Vocal warm-up: lift a dry voice track

A dry voice recording sounds clinical. A touch of reverb and a light chorus adds the production polish typical of podcast intros and YouTube voice-overs — without sounding processed.

bash
# Step 1: subtle chorus for body (very light depth/speed)
npx fqmpeg chorus voice.wav --depth 1.5 --speed 0.3 -o voice-chorus.wav

# Step 2: short reverb tail for room sense
npx fqmpeg reverb voice-chorus.wav --delay 60 --decay 0.3 -o voice-ready.wav

Why this order: chorus first thickens the source, then reverb places the thickened result in a small room. Reverse the order and the chorus voices each get their own reverb tail — muddier.

Lo-fi guitar layer: phaser + tremolo

For a chillhop guitar bed, layer phaser sweep onto tremolo pulse:

bash
# Slow phaser sweep (long cycle)
npx fqmpeg phaser guitar.wav --speed 0.2 --decay 0.5 -o guitar-phaser.wav

# Slow tremolo pulse on top (1 cycle per second)
npx fqmpeg tremolo guitar-phaser.wav --freq 1 --depth 0.4 -o guitar-lofi.wav

The phaser provides the textural movement; the tremolo provides the rhythmic pulse. Both at slow rates — fast modulation pushes this from "lo-fi" to "broken cassette."

AM-radio dialogue effect

The classic "voice through a telephone" effect needs bandpass filtering (in C9, not here) plus distortion or echo. Combining audio-bandpass with echo-effect is a credible quick approximation:

bash
# Step 1: telephone-band filter (300-3400 Hz) — C9 verb
npx fqmpeg audio-bandpass voice.wav --low 300 --high 3400 -o voice-band.wav

# Step 2: short tinny echo
npx fqmpeg echo-effect voice-band.wav --delay 60 --decay 0.5 --repeats 2 -o voice-radio.wav

Real AM-radio dialogue also adds amplitude clipping and noise — for those, you'd reach for raw FFmpeg's acrusher and anoisesrc. fqmpeg doesn't currently expose either.

Frequently Asked Questions

Why is reverb so different from a real DAW reverb plugin?

Because under the hood it's a single-tap aecho filter, not an impulse-response or Schroeder network reverb. With --delay 40 --decay 0.5 you get one discrete reflection at 40 ms, attenuated to 50% — that's enough to suggest "small room" if mixed lightly, but it lacks the dense early-reflection cluster and diffuse tail that defines a real space. For credible reverb, switch to ffmpeg ... -af freeverb=... (if your build has it) or convolution via afir with an impulse-response WAV.

Can I tune chorus to sound less "warm" and more "ethereal"?

Not via fqmpeg — the 3-voice configuration (delays 50|60|70 ms, decays 0.4|0.32|0.28) is hardcoded. You can change only the LFO depth and speed, which controls how much the existing 3 voices wobble, not how they're spaced. To get an ethereal/wide chorus (e.g. 8 voices spread 20-200 ms with low decay), call FFmpeg's chorus filter directly with custom |-separated lists. Use npx fqmpeg chorus input --dry-run to see the format, then edit the lists in a manual FFmpeg invocation.

flanger --mix 0.7 looks like it's producing width=70 — is that a bug?

No, that's the intended behavior. FFmpeg's underlying flanger filter expects width=0-100 for wet/dry, but the conventional CLI convention for "mix" is 0-1. fqmpeg accepts the 0-1 form and multiplies by 100. The earlier (pre-v3.0) versions passed mix=0.7 directly to the filter, which the filter silently ignored and ran at its default (100% wet). The multiplication is the fix.

How is vibrato different from audio-pitch (in C9)?

audio-pitch shifts pitch by a fixed number of semitones, applied uniformly to the whole track. vibrato oscillates pitch up and down around the original at a chosen rate — the average pitch is unchanged. Use audio-pitch to transpose a melody to a new key; use vibrato to make a sustained note "shimmer."

Why does audio-karaoke leave the vocal partly audible?

It assumes vocals are panned identically into both stereo channels (which makes them cancel when you subtract one from the other). Modern pop production breaks this assumption: vocals often have stereo widening, doublers, reverb, and chorus on a stereo bus — none of which cancel. Drums, bass, and other center-panned elements are also cancelled, so what's left is a thin instrumental + ghosted vocal. For real vocal removal, use ML-based source separation tools (Spleeter, Demucs, Moises) outside FFmpeg.

Will audio-stereo-widen break on mono playback?

Yes, that's its main risk. The Haas-style inter-channel delay (default 20 ms) creates phase relationships that summed to mono become a comb filter — the audio sounds hollow and notched. If your final output might be played on a single speaker (smart speakers, phone speakerphone, mono Bluetooth, AM radio simulcast), test the mono downmix first: ffmpeg -i input-wide.mp4 -ac 1 -t 10 mono-test.mp3. If it sounds significantly worse than the source, lower --delay (try 8-12 ms) or skip widening for that delivery.

Can I chain multiple C10 verbs in a single FFmpeg pass to avoid generation loss?

Not via fqmpeg directly — each verb produces its own output and re-encodes. For lossless intermediates between verbs, pass -c:a copy won't help because the filter has to re-encode audio; instead, encode each step to a lossless format like FLAC or WAV with raw FFmpeg, or copy the filter strings from each --dry-run and combine them into one FFmpeg invocation:

bash
# Combine chorus + reverb in one pass (filters from --dry-run)
ffmpeg -i voice.wav -af "chorus=0.5:0.9:50|60|70:0.4|0.32|0.28:1.5|1.5|1.5:0.3|0.3|0.3,aecho=0.8:0.88:60:0.3" voice-warm.wav

Tremolo and vibrato have the same flags — how do I remember which is which?

A mnemonic: tremolo modulates volume (think of "trembling loudness" — a held note that pulses); vibrato modulates pitch (think of "vibrating string" — a held note that wobbles in tone). Same --freq and --depth, opposite musical sensation.

Wrapping Up

The nine C10 verbs cover the most common creative-effect operations you'd reach for between EQ/dynamics (C9) and final delivery:

  • reverb, echo-effect for time-based depth (single-tap reverb is honest about being a 1-tap aecho; echo geometric decay is the natural physical model)
  • chorus, phaser, flanger, tremolo, vibrato for LFO modulation (chorus has the most hidden machinery — the 3-voice preset is hardcoded and intentional; tremolo and vibrato share an option surface but do completely different things)
  • audio-karaoke, audio-stereo-widen for stereo-field tricks (karaoke only works on dry, center-panned vocals; widen breaks on mono playback)

Every verb prints its underlying FFmpeg invocation under --dry-run, so when the simplified surface isn't enough (custom 8-voice chorus, ping-pong stereo echoes, triangular phaser sweep), copy the filter, edit the parameters, and call FFmpeg directly. For the broader fqmpeg map, see the fqmpeg complete guide.