fqmpeg's C11 cluster is the "plumbing and instruments" of the audio toolkit — fourteen verbs that move audio around (channel layout, multi-track assembly), reshape its data format (sample rate, bit depth), or turn audio into pictures (visualization). Compared with C9 (levels / EQ / dynamics) and C10 (creative effects), C11 changes the container and routing of audio, rarely its sonic content.
This guide walks each verb against its source in src/commands/ of fqmpeg 3.0.3 — the underlying FFmpeg filter or flag, the defaults, the output filename, and the gotchas that aren't visible from --help alone (stereo --mode 5.1 and surround are identical, pan-audio is linear-attenuation not equal-power, bit-depth only works on PCM-compatible containers, audio-visualize --mode waves paints in a fixed green).
What you'll get out of this guide
- A decision matrix for the 14 verbs by task (channel layout / multi-track / format / visualization)
- Exact FFmpeg invocation each verb generates (verified
--dry-runoutput) - Defaults, units, output filenames — and what's hardcoded in the simplified surface
- Three recipes — dual-mic podcast assembly, multilingual lecture upload, audio-only YouTube upload with waveform visual
The 14 Verbs at a Glance
The cluster splits into four task groups. Pick the group, then the verb.
| Group | Verbs | What they do |
|---|---|---|
| Channel layout | stereo, surround, extract-audio-channel, pan-audio | Change channel count, isolate one side, position within stereo field |
| Multi-track assembly | multi-audio, mix-audio, replace-audio, concat-audio | Add additional tracks, mix in BGM, swap out audio, join clips |
| Format & quality | sample-rate, bit-depth | Resample to a target Hz, change PCM bit depth |
| Visualization | audio-visualize, oscilloscope, waveform, spectrum | Render animated video, overlay vectorscope, generate static PNG waveform / spectrogram |
Five things to know before reading on:
stereoandsurroundare thin wrappers around-ac. They set the channel count (1,2,6) and rely on FFmpeg's default downmix/upmix matrix. There's no special LFE routing, no Dolby-aware mixing —surroundis literallystereo --mode 5.1. If you need a credible stereo→5.1 upmix, use thesurroundFFmpeg filter directly (different thing — it's an envelope-following upmixer).pan-audiois a linear pan, not equal-power. At--position 0(center) both channels are at unity gain, so the summed mono amplitude is +6 dB louder than at the extremes. If you're crossfading between center-panned dialogue and a hard-panned source, expect a midpoint volume bump.bit-depthworks on PCM-compatible containers only. It forcespcm_s16le/pcm_s24le/pcm_s32leas the audio codec. For MP4 or MKV with AAC, this command won't apply — you'd typically use it on WAV files. The output extension is preserved from the input, so feed it a.wav.audio-visualizeand the static visualizers ship fixed colors / layouts.audio-visualize --mode wavespaints in0x00FF00(terminal green);--mode spectrumiscolor=intensity. The staticwaveformdefaults to0x00FF00too (but exposes--color). The staticspectrumexposes 14 color schemes via--color. Theoscilloscopeoverlay is a fixedavectorscopeof size 320×320 placed at 70 % screen position.multi-audiorecommends.mkvbecause MP4 multi-audio is shaky. MP4 technically supports multiple audio streams, but many players (Safari, mobile browsers) only play the first. MKV is universally honest about track switching.
Channel Layout & Routing
stereo — Convert audio channels (mono ↔ stereo ↔ 5.1)
A thin wrapper around FFmpeg's -ac flag. Sets the channel count to 1, 2, or 6; the actual up/downmix matrix comes from FFmpeg's defaults.
- Source:
src/commands/stereo.js - Flag:
-ac <1|2|6> - Output:
<input-stem>-<mode>.<ext>
| Argument / Option | Default | Allowed | Notes |
|---|---|---|---|
<input> | required | — | Input video/audio |
--mode <mode> | stereo | mono, stereo, 5.1 | Maps to -ac 1, -ac 2, -ac 6 |
-o, --output <path> | <input-stem>-<mode>.<ext> | — | Override output |
$ npx fqmpeg stereo input.mp4 --mode mono --dry-run
ffmpeg -i input.mp4 -ac 1 -c:v copy input-mono.mp4
$ npx fqmpeg stereo input.mp4 --mode 5.1 --dry-run
ffmpeg -i input.mp4 -ac 6 -c:v copy input-5.1.mp4
What's hardcoded and why: the up/downmix matrix is not hardcoded by fqmpeg — it's FFmpeg's built-in pan-law matrix. For stereo → mono, FFmpeg sums L+R with equal gain (−3 dB each). For stereo → 5.1, it duplicates L into FL+BL, R into FR+BR, sums to FC, and synthesizes LFE from low-passed FC — a reasonable default but not a creative upmix. For credible stereo-to-5.1 widening (envelope-following ambience extraction), drop to FFmpeg's surround filter directly (the filter, not this verb).
When you outgrow it: for non-default upmix logic, write the pan matrix explicitly with the pan filter:
ffmpeg -i stereo.wav -af "pan=5.1|FL=c0|FR=c1|FC=0.5*c0+0.5*c1|LFE=0.1*c0+0.1*c1|BL=c0|BR=c1" surround.wav
surround — Upmix stereo audio to 5.1 surround sound
A discoverability alias. Functionally identical to stereo --mode 5.1: both pass -ac 6 and rely on FFmpeg's default upmix matrix.
- Source:
src/commands/surround.js - Flag:
-ac 6 - Output:
<input-stem>-surround.<ext>
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video/audio |
-o, --output <path> | <input-stem>-surround.<ext> | Override output |
$ npx fqmpeg surround input.mp4 --dry-run
ffmpeg -i input.mp4 -ac 6 -c:v copy input-surround.mp4
Why two commands for the same thing: discoverability. Users searching fqmpeg --help for "surround" find this verb directly. Users thinking in terms of channel layout reach for stereo --mode 5.1. Both produce identical output.
extract-audio-channel — Extract a single audio channel
Pulls the left or right channel out of a stereo source and outputs it as mono. Useful when one mic was recorded onto one side (interview rigs, dual-mic field recorders).
- Source:
src/commands/extract-audio-channel.js - Filter:
pan=mono|c0=FL(left) /pan=mono|c0=FR(right) - Output:
<input-stem>-<channel>.<ext>
| Argument | Required | Allowed | Notes |
|---|---|---|---|
<input> | yes | — | Input video/audio |
<channel> | yes | left, right | Which side to keep |
-o, --output <path> | no | — | Override output |
$ npx fqmpeg extract-audio-channel interview.wav left --dry-run
ffmpeg -i interview.wav -af pan=mono|c0=FL -c:v copy interview-left.wav
The output is mono (pan=mono|...), not stereo with the other side silent — picking left and right separately gives you two mono files of the same length.
pan-audio — Pan audio left / right
Repositions the source within the stereo field. -1.0 = full left, 1.0 = full right, 0 = center.
- Source:
src/commands/pan-audio.js - Filter:
pan=stereo|c0=<L>*c0+0*c1|c1=0*c0+<R>*c1whereL = min(1, 1-p)andR = min(1, 1+p) - Output:
<input-stem>-panned.<ext>
| Argument / Option | Required | Range | Notes |
|---|---|---|---|
<input> | yes | — | Input video/audio |
<position> | yes | -1.0 to 1.0 | 0 = center, -1 = full left, 1 = full right |
-o, --output <path> | no | — | Override output |
$ npx fqmpeg pan-audio input.mp4 0.5 --dry-run
ffmpeg -i input.mp4 -af pan=stereo|c0=0.50*c0+0*c1|c1=0*c0+1.00*c1 -c:v copy input-panned.mp4
Pan law: linear attenuation, clipped at unity. At position 0, both channels are 1.0 — the summed mono is +6 dB louder than at the extremes (position ±1, one side at 1.0 and the other at 0). This is not equal-power panning, which would attenuate each channel by 3 dB at center to keep mono summation constant. If you're crossfading between a center-panned dialogue and a hard-panned source, you'll hear a midpoint bump.
When you need equal-power: write the pan matrix yourself with sqrt-weighted gains:
# Equal-power pan at position 0.5 (slightly right): L = sqrt((1-p)/2), R = sqrt((1+p)/2)
ffmpeg -i in.wav -af "pan=stereo|c0=0.500*c0|c1=0.866*c1" out.wav
Multi-Track Assembly
multi-audio — Add multiple audio tracks to a video
Attaches one or more audio files as additional audio streams in the output (alongside any existing tracks). Useful for multilingual exports — viewers switch audio in the player. Output defaults to .mkv because MP4 multi-audio playback is unreliable across players.
- Source:
src/commands/multi-audio.js - Flags:
-map 0:v -map 0:a? -map 1:a -map 2:a ... -c copy - Output:
<video-stem>-multi-audio.mkv(forced.mkvextension)
| Argument / Option | Required | Notes |
|---|---|---|
<video> | yes | Input video file |
<audios...> | yes (≥1) | Additional audio files |
-o, --output <path> | no | Override output (.mkv recommended) |
$ npx fqmpeg multi-audio video.mp4 jp.aac es.aac --dry-run
ffmpeg -i video.mp4 -i jp.aac -i es.aac -map 0:v -map 0:a? -map 1:a -map 2:a -c copy video-multi-audio.mkv
-c copy everywhere: no re-encoding of video or audio. All inputs must already be in a format the output container (.mkv by default) accepts. The -map 0:a? uses an optional flag (?), so if the source video has no audio, the command still works.
The .mkv choice: Matroska supports unlimited audio streams cleanly, and players (VLC, mpv, modern browsers via <video> + track switching) handle it. MP4 nominally supports multi-audio but iOS Safari and some Android browsers ignore non-primary tracks.
mix-audio — Mix a secondary audio track (BGM, narration)
Blends a second audio file in alongside the original. Default volume for the mix-in is 30 % — quiet enough that it works as background music without overwhelming dialogue.
- Source:
src/commands/mix-audio.js - Filter:
[1:a]volume=<v>[bgm];[0:a][bgm]amix=inputs=2:duration=first - Output:
<input-stem>-mixed.<ext>
| Argument / Option | Default | Range | Notes |
|---|---|---|---|
<input> | required | — | Input video |
<audio> | required | — | Audio to mix in |
--volume <level> | 0.3 | 0.0 to 1.0 | Volume of the mixed-in audio |
--shortest | off | flag | End output when the shorter stream ends |
-o, --output <path> | <input-stem>-mixed.<ext> | — | Override output |
$ npx fqmpeg mix-audio video.mp4 bgm.mp3 --dry-run
ffmpeg -i video.mp4 -i bgm.mp3 -filter_complex [1:a]volume=0.3[bgm];[0:a][bgm]amix=inputs=2:duration=first -map 0:v -c:v copy video-mixed.mp4
duration=first: the output ends when the first (original) audio ends. If your BGM is longer than the video, it's truncated. If shorter, it stops mid-track and the rest of the video plays with only the original audio. Use --shortest to also cap by the BGM length if you want a tight loop instead.
amix normalization quirk: FFmpeg's amix filter divides the sum by the input count by default, so mixing 2 streams attenuates each by 50 % before scaling. fqmpeg's --volume 0.3 is applied to the BGM before amix, so the effective BGM level in the output is roughly 0.3 / 2 = 0.15 (−16 dB). If the BGM feels too quiet, try --volume 0.6 rather than expecting 0.3 to mean −10 dB.
replace-audio — Replace the audio track entirely
Discards the original audio, swaps in a new audio file. Stream-copy: no re-encoding, so the new audio must already be in a format the output container accepts.
- Source:
src/commands/replace-audio.js - Flags:
-map 0:v -map 1:a -c:v copy - Output:
<input-stem>-newaudio.<ext>
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video |
<audio> | required | New audio file |
--shortest | off | End at the shorter of (video, new audio) |
-o, --output <path> | <input-stem>-newaudio.<ext> | Override output |
$ npx fqmpeg replace-audio video.mp4 voiceover.aac --dry-run
ffmpeg -i video.mp4 -i voiceover.aac -map 0:v -map 1:a -c:v copy video-newaudio.mp4
The new audio is stream-copied (no -c:a flag overrides this, so FFmpeg uses copy by default when no filter graph touches the stream). If your new audio is a .wav and the container is .mp4, FFmpeg may refuse (PCM in MP4 is non-standard). Convert the audio to AAC first, or change the output extension.
concat-audio — Concatenate multiple audio files
Joins two or more audio files end-to-end. Uses FFmpeg's concat demuxer (the listfile-based approach), which requires all inputs to share the same codec, sample rate, and channel layout (otherwise stream-copy fails). fqmpeg auto-generates the listfile in the directory of the first input, runs the concat, and cleans up on exit.
- Source:
src/commands/concat-audio.js - Flags:
-f concat -safe 0 -i <listfile> -c copy - Output:
<first-input-stem>-joined.<first-input-ext>
| Argument / Option | Required | Notes |
|---|---|---|
<inputs...> | yes (≥2) | Two or more audio files in order |
-o, --output <path> | no | Override output |
$ npx fqmpeg concat-audio part1.mp3 part2.mp3 part3.mp3 --dry-run
# File list (auto-generated):
# file '/abs/path/part1.mp3'
# file '/abs/path/part2.mp3'
# file '/abs/path/part3.mp3'
ffmpeg -f concat -safe 0 -i filelist.txt -c copy part1-joined.mp3
concat demuxer requirements: all inputs must have identical codec + sample rate + channel layout. If part1 is 48 kHz stereo MP3 and part2 is 44.1 kHz mono MP3, stream-copy concat fails. Either re-encode the parts to a common format first, or use the concat filter (not demuxer) which re-encodes:
ffmpeg -i p1.mp3 -i p2.mp3 -filter_complex "[0:a][1:a]concat=n=2:v=0:a=1[out]" -map "[out]" joined.mp3
Absolute paths in the listfile: fqmpeg resolves each input to an absolute path before writing the listfile, so the concat works regardless of where the listfile sits. The listfile is created in the directory of the first input with a timestamped filename like .fqmpeg-concat-audio-1730000000000.txt and deleted on exit.
Format & Quality
sample-rate — Change audio sample rate
Resamples to a target rate via -ar. FFmpeg's default resampler (soxr in modern builds, aresample otherwise) handles up- and down-sampling.
- Source:
src/commands/sample-rate.js - Flag:
-ar <rate> - Output:
<input-stem>-<rate>hz.<ext>(e.g.input-48000hz.mp4)
| Argument / Option | Required | Notes |
|---|---|---|
<input> | yes | Input video/audio |
<rate> | yes (positive integer in Hz) | Common: 44100 (CD), 48000 (video/broadcast), 96000 (mastering) |
-o, --output <path> | no | Override output |
$ npx fqmpeg sample-rate input.wav 48000 --dry-run
ffmpeg -i input.wav -ar 48000 -c:v copy input-48000hz.wav
Why no enum: the description suggests 44100, 48000, 96000, but the validator only requires positive integer. Niche rates (22050 for retro phone audio, 192000 for high-res masters) work too. Anything FFmpeg's swresample accepts.
Re-encoding is required. Stream-copy can't change sample rate. fqmpeg passes no -c:a flag, so FFmpeg picks the default codec for the output container (AAC for MP4, PCM for WAV, etc.) and re-encodes. To force a specific codec, drop to raw FFmpeg.
bit-depth — Change audio bit depth (PCM)
Forces the audio codec to PCM at 16, 24, or 32 bit depth. Practically this means the output must be a container that supports PCM (WAV, MKV, AIFF) — not MP4 with AAC.
- Source:
src/commands/bit-depth.js - Flag:
-c:a pcm_s16le/pcm_s24le/pcm_s32le - Output:
<input-stem>-<bits>bit.<ext>
| Argument | Required | Allowed | Notes |
|---|---|---|---|
<input> | yes | — | Input video/audio (typically WAV) |
<bits> | yes | 16, 24, 32 | Target bit depth |
-o, --output <path> | no | — | Override output |
$ npx fqmpeg bit-depth master.wav 24 --dry-run
ffmpeg -i master.wav -c:a pcm_s24le -c:v copy master-24bit.wav
Container compatibility: .wav accepts all three depths. .mkv accepts them too (PCM in Matroska is valid). .mp4 will fail — pcm_s24le is not a valid codec for ISO BMFF (MP4). If you feed in an .mp4, change the output extension to .mkv or .wav via -o.
24-bit internal: pcm_s24le stores 24-bit samples in 32-bit containers (little-endian, with 8 padding bits). The output file size scales accordingly — 24-bit files are 1.5× the size of 16-bit, not 2× (and 32-bit files are 2× exactly).
Bit depth ≠ audio resolution. A 24-bit file recorded from a 16-bit source doesn't contain more detail — the extra 8 bits are zeroed. Useful only when the source actually has higher dynamic range (multi-mic field recording, mastering masters).
Visualization
audio-visualize — Animated audio visualization video
Renders audio as a video stream with one of three live-updating visualizers: scrolling waveform (waves), scrolling spectrogram (spectrum), or histogram (histogram). Output is .mp4 (H.264 + yuv420p) regardless of input extension.
- Source:
src/commands/audio-visualize.js - Filter (waves):
showwaves=s=<W>x<H>:mode=cline:rate=30:colors=0x00FF00 - Filter (spectrum):
showspectrum=s=<W>x<H>:mode=combined:color=intensity:slide=scroll - Filter (histogram):
ahistogram=s=<W>x<H>:rheight=0.5 - Output:
<input-stem>-visualize.mp4(forced.mp4)
| Argument / Option | Default | Allowed | Notes |
|---|---|---|---|
<input> | required | — | Input audio (video also works — only audio stream is used) |
--mode <mode> | waves | waves, spectrum, histogram | Visualization style |
--size <WxH> | 1920x1080 | resolution string | Output video size |
-o, --output <path> | <input-stem>-visualize.mp4 | — | Override output |
$ npx fqmpeg audio-visualize song.mp3 --dry-run
ffmpeg -i song.mp3 -filter_complex showwaves=s=1920x1080:mode=cline:rate=30:colors=0x00FF00 -c:v libx264 -pix_fmt yuv420p song-visualize.mp4
$ npx fqmpeg audio-visualize song.mp3 --mode spectrum --dry-run
ffmpeg -i song.mp3 -filter_complex showspectrum=s=1920x1080:mode=combined:color=intensity:slide=scroll -c:v libx264 -pix_fmt yuv420p song-visualize.mp4
What's hardcoded:
waves: color0x00FF00(terminal green), drawing modecline(centered line, scrolls), refresh rate30fpsspectrum: modecombined(one band per stereo pair), color schemeintensity, scroll modeslide=scrollhistogram:rheight=0.5(relative bar height)- All three: video codec
libx264+yuv420pfor universal playback
The choice of 0x00FF00 for waves matches the 32blog brand color, but more importantly it reads well against a black background (which showwaves provides by default). If you want a different color, drop to raw FFmpeg:
ffmpeg -i song.mp3 -filter_complex "showwaves=s=1920x1080:mode=cline:rate=30:colors=0xFF6600" \
-c:v libx264 -pix_fmt yuv420p song-orange.mp4
When you outgrow it: for stylized visualizers (3D bars, Spotify-style equalizer with peak hold, particle systems), FFmpeg alone won't cut it. Look at tools like Specterr or butterchurn (MilkDrop in the browser) for music videos. fqmpeg's visualizers are functional but plain.
oscilloscope — Oscilloscope overlay on existing video
Overlays an avectorscope (a circular phase-correlation display) onto an existing video. Useful for music videos and electronic music demos where seeing the stereo image is part of the aesthetic.
- Source:
src/commands/oscilloscope.js - Filter:
avectorscope=s=320x320:zoom=1.5:draw=line,format=yuva420p[osc];[0:v][osc]overlay=W*0.7:H*0.7 - Output:
<input-stem>-oscilloscope.<ext>
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video with audio |
-o, --output <path> | <input-stem>-oscilloscope.<ext> | Override output |
$ npx fqmpeg oscilloscope musicvideo.mp4 --dry-run
ffmpeg -i musicvideo.mp4 -filter_complex avectorscope=s=320x320:zoom=1.5:draw=line,format=yuva420p[osc];[0:v][osc]overlay=W*0.7:H*0.7 -c:a copy musicvideo-oscilloscope.mp4
avectorscope reads stereo correlation. The dot pattern shows how the left and right channels relate moment-to-moment: a vertical line = mono (perfectly correlated), a horizontal line = out-of-phase (mono-summing will cancel), a circular cloud = wide stereo. For dual-mono dialogue, you'll see a single vertical line. For a wide synth pad, you'll see a noisy ellipse.
Audio is preserved (-c:a copy), video is re-encoded because of the filter graph.
When you outgrow it: the underlying avectorscope filter has many more parameters (zoom, m=peak/mode=binary, rc/gc/bc for color, t=draw/line/dot). To customize beyond what fqmpeg exposes, copy the --dry-run filter and edit:
# Color the trace, lower zoom, draw filled
ffmpeg -i input.mp4 -filter_complex "avectorscope=s=480x480:zoom=1.0:draw=dot:rc=255:gc=128:bc=0,format=yuva420p[osc];[0:v][osc]overlay=W*0.5-240:H*0.5-240" -c:a copy out.mp4
waveform — Static waveform image
Renders the entire audio track as a single PNG waveform — the "audiogram" look common on podcast covers and audio-to-video transcoded clips.
- Source:
src/commands/waveform.js - Filter:
aformat=channel_layouts=mono,showwavespic=s=<W>x<H>:colors=<hex> - Output:
<input-stem>-waveform.png(forced.png)
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input audio/video |
--size <WxH> | 1920x200 | Output image size — wider for podcast covers, taller for vinyl-style displays |
--color <hex> | 0x00FF00 | Hex color (any valid FFmpeg color spec — 0xRRGGBB or Color@Alpha) |
-o, --output <path> | <input-stem>-waveform.png | — |
$ npx fqmpeg waveform song.mp3 --dry-run
ffmpeg -i song.mp3 -filter_complex aformat=channel_layouts=mono,showwavespic=s=1920x200:colors=0x00FF00 -frames:v 1 song-waveform.png
Mono summation first: aformat=channel_layouts=mono downmixes the audio before rendering. This is intentional — a stereo waveform image with two stacked traces is harder to read than a single combined trace, and the static PNG can't show stereo movement anyway. If you want a two-channel display, drop the aformat step and adjust showwavespic settings:
ffmpeg -i song.mp3 -filter_complex "showwavespic=s=1920x400:colors=0x00FF00|0xFF6600:split_channels=1" -frames:v 1 song-stereo.png
Single PNG via -frames:v 1: showwavespic produces one frame containing the full track waveform. The flag tells FFmpeg to write a single image instead of a sequence.
spectrum — Static spectrogram image
Renders a frequency-vs-time spectrogram of the audio as a single PNG. Use it to find unwanted hum (60 Hz horizontal line), high-frequency hiss (energy above 8 kHz), or visualize the spectral signature of a track for cover art.
- Source:
src/commands/spectrum.js - Filter:
showspectrumpic=s=<W>x<H>:color=<scheme> - Output:
<input-stem>-spectrum.png(forced.png)
| Argument / Option | Default | Allowed | Notes |
|---|---|---|---|
<input> | required | — | Input audio/video |
--size <WxH> | 1920x512 | resolution string | Spectrum area size; the default legend adds margins, so the final PNG is larger (~2200×640 for the default 1920x512) |
--color <mode> | intensity | intensity, rainbow, moreland, nebulae, fire, fiery, fruit, cool, magma, green, viridis, plasma, cividis, terrain | Color scheme |
-o, --output <path> | <input-stem>-spectrum.png | — | — |
$ npx fqmpeg spectrum song.mp3 --dry-run
ffmpeg -i song.mp3 -filter_complex showspectrumpic=s=1920x512:color=intensity -frames:v 1 song-spectrum.png
$ npx fqmpeg spectrum song.mp3 --color viridis --dry-run
ffmpeg -i song.mp3 -filter_complex showspectrumpic=s=1920x512:color=viridis -frames:v 1 song-spectrum.png
Reading a spectrogram: time runs horizontally (left to right), frequency runs vertically (low at bottom, high at top, log-scaled), color = energy. A clean speech track shows energy concentrated 100 Hz–4 kHz with gaps for breath; a kick drum is a vertical streak at 60–100 Hz; an unwanted ground-loop hum is a perfectly horizontal line at 50 or 60 Hz. For mastering work, the spectrogram is faster diagnostic than EQ-with-spectrum-analyzer.
showspectrumpic vs showspectrum: the pic suffix produces a static image of the entire file. The non-pic version produces an animated video (which is what audio-visualize --mode spectrum uses). Don't confuse them.
Real-World Recipes
Recipe 1: Dual-mic podcast — split tracks → normalize → join
A common Zoom-style podcast setup records both speakers onto one stereo file: host on left, guest on right. To master them individually (different gain, different EQ) and reassemble:
# Step 1: extract each side
npx fqmpeg extract-audio-channel raw.wav left -o host.wav
npx fqmpeg extract-audio-channel raw.wav right -o guest.wav
# Step 2: normalize each (C9 verb)
npx fqmpeg normalize host.wav -o host-n.wav
npx fqmpeg normalize guest.wav -o guest-n.wav
# Step 3: mix back into a stereo file with each on their original side
ffmpeg -i host-n.wav -i guest-n.wav \
-filter_complex "[0:a]apad,pan=stereo|c0=c0|c1=0[hL];[1:a]apad,pan=stereo|c0=0|c1=c0[gR];[hL][gR]amerge=inputs=2,pan=stereo|c0=c0+c2|c1=c1+c3" \
podcast-final.wav
The last step uses raw FFmpeg because fqmpeg doesn't have a "stereo-from-two-monos" verb. The pan chain places host on left, guest on right, then amerge combines, then a final pan collapses back to true stereo.
Recipe 2: Multilingual lecture — one video, three audio tracks
You have a 50-minute lecture video and three voiceover dubs (Japanese, English, Spanish). Want a single MKV with all three audio streams for YouTube / school CMS uploads:
# All voiceovers must already be the same length as the video.
# (If they're not, trim or pad with fqmpeg trim first.)
npx fqmpeg multi-audio lecture.mp4 lecture-ja.aac lecture-en.aac lecture-es.aac
# → lecture-multi-audio.mkv
The output is an MKV with the original video, the original audio track (if any), and three additional audio streams. Players that support track selection (VLC, mpv, modern HTML5 video) let viewers switch languages without separate downloads.
Recipe 3: Audio-only YouTube upload with waveform background
For an interview podcast you want to upload to YouTube. YouTube needs video, but you only have audio. Generate an audio-reactive waveform video as the visual:
# Step 1: render an animated waveform video (full duration)
npx fqmpeg audio-visualize interview.mp3 --size 1920x1080 -o interview-visual.mp4
# Step 2: swap the (silent) visualizer's audio for the original
npx fqmpeg replace-audio interview-visual.mp4 interview.mp3 -o interview-final.mp4
audio-visualize produces a video with the audio embedded too, but if you wanted to layer a different track (e.g. the visual on top of a normalized version, or the original with a different intro), replace-audio swaps cleanly.
For a thumbnail of the same interview, use spectrum to generate a one-shot spectrogram PNG:
npx fqmpeg spectrum interview.mp3 --color viridis --size 1920x1080 -o interview-cover.png
Frequently Asked Questions
Are stereo --mode 5.1 and surround actually identical?
Yes — both run ffmpeg -i input -ac 6 -c:v copy output. The only difference is the output filename suffix (-5.1 vs -surround) and the command name. fqmpeg exposes two because users discover them through different mental models — "I want to change channel count" lands on stereo; "I want 5.1 surround" lands on surround.
Why does pan-audio --position 0 sound louder than --position 1?
Because the pan law is linear-attenuation with unity clipping, not equal-power. At position 0, both channels are at 1.0 gain. At position 1 (full right), left = 0 and right = 1. Summed to mono, position 0 is L+R = 2.0 = +6 dB, while position 1 is just R = 1.0. To get equal-power panning where mono summation stays constant, write the pan filter yourself with sqrt-weighted gains: c0=sqrt(1-p)*c0|c1=sqrt(1+p)*c1 (where p is in 0–1).
bit-depth 24 fails with my MP4 file — why?
Because MP4 (ISO BMFF) doesn't accept pcm_s24le as an audio codec. Bit-depth changes only work in containers that support PCM: .wav, .mkv, .aiff, .flac. Either convert your MP4 audio to WAV first (ffmpeg -i in.mp4 -vn -c:a pcm_s16le audio.wav), run bit-depth on the WAV, then re-mux back. Or change the output extension to .mkv via -o video-24bit.mkv.
Why is the BGM so quiet at mix-audio --volume 0.3?
Because FFmpeg's amix filter divides its output by the input count (to prevent clipping). With 2 inputs, each is halved before summing, so your BGM at volume=0.3 ends up at effectively 15 % of original level (0.3 / 2 = 0.15, about −16 dB). The original audio is also halved, then summed. If the BGM feels too quiet relative to dialogue, try --volume 0.6 for "noticeable but not dominant" or --volume 1.0 for a roughly equal mix.
Can I concat audio files of different sample rates / codecs?
Not with concat-audio — it uses the concat demuxer, which requires identical codec + sample rate + channel layout for stream-copy. If your parts differ, you have two options: (1) re-encode each to a common format first (e.g. npx fqmpeg sample-rate part1.mp3 48000 -o part1-48k.mp3 for each), or (2) use the concat filter directly in raw FFmpeg, which re-encodes everything:
ffmpeg -i p1.mp3 -i p2.wav -i p3.aac -filter_complex "[0:a][1:a][2:a]concat=n=3:v=0:a=1[out]" -map "[out]" joined.mp3
Why does audio-visualize --mode waves always output green?
Because the underlying filter call hardcodes colors=0x00FF00 (terminal green, matching the 32blog brand). The --color option doesn't exist on this verb. For a different color, take the --dry-run output, edit the hex value, and invoke FFmpeg directly. The static waveform command does expose --color, so for a one-off color choice, generate a still and use it as a poster image instead of a motion visualizer.
How is audio-visualize --mode spectrum different from spectrum?
audio-visualize --mode spectrum produces an animated video (scrolling spectrogram, showspectrum filter, .mp4 output). spectrum produces a single static image (the whole track as one PNG, showspectrumpic filter, .png output). Use the animated one as a video backdrop for an audio upload; use the static one as cover art or for one-glance diagnosis.
Does multi-audio re-encode anything?
No — every stream is -c copy. Both video and all audio tracks are stream-copied. This means the input audio files must already be in a format the output container (.mkv by default) accepts. AAC, MP3, Opus, Vorbis, and PCM all work. If you try to multiplex an Apple Lossless (ALAC) into a strict .mp4, it'll work (MP4 does support ALAC), but some browsers won't play it.
Wrapping Up
The fourteen C11 verbs sit between "you have an audio track" and "it's the right shape for the deliverable":
stereo,surround,extract-audio-channel,pan-audiofor channel layout (the first two are thin-acwrappers;pan-audiouses linear-attenuation, not equal-power)multi-audio,mix-audio,replace-audio,concat-audiofor multi-track assembly (mix'samixhalves each input — bump--volumeif BGM feels quiet; concat requires identical codec/SR/layout for stream-copy)sample-rate,bit-depthfor format conversion (bit-depthis PCM-only — feed it WAV or MKV, not MP4)audio-visualize,oscilloscope,waveform,spectrumfor visualization (animated visualizers paint in fixed colors; static images are the "audiogram" cover-art workflow)
Every verb prints its underlying FFmpeg invocation under --dry-run, so when the simplified surface isn't enough (custom pan laws, multi-codec concat, two-channel spectrograms), copy the filter, edit, and call FFmpeg directly. For the broader fqmpeg map, see the fqmpeg complete guide.