fqmpeg Audio Routing, Channels & Visualization: 14 Verbs Explained

Q: Are `stereo --mode 5.1` and `surround` actually identical?

Yes — both run `ffmpeg -i input -ac 6 -c:v copy output`. The only difference is the output filename suffix (`-5.1` vs `-surround`) and the command name. fqmpeg exposes two because users discover them through different mental models — "I want to change channel count" lands on `stereo`; "I want 5.1 surround" lands on `surround`.

Q: `bit-depth 24` fails with my MP4 file — why?

Because MP4 (ISO BMFF) doesn't accept `pcm_s24le` as an audio codec. Bit-depth changes only work in containers that support PCM: `.wav`, `.mkv`, `.aiff`, `.flac`. Either convert your MP4 audio to WAV first (`ffmpeg -i in.mp4 -vn -c:a pcm_s16le audio.wav`), run `bit-depth` on the WAV, then re-mux back. Or change the output extension to `.mkv` via `-o video-24bit.mkv`.

Q: Why is the BGM so quiet at `mix-audio --volume 0.3`?

Because FFmpeg's `amix` filter divides its output by the input count (to prevent clipping). With 2 inputs, each is halved before summing, so your BGM at `volume=0.3` ends up at effectively 15 % of original level (`0.3 / 2 = 0.15`, about −16 dB). The original audio is also halved, then summed. If the BGM feels too quiet relative to dialogue, try `--volume 0.6` for "noticeable but not dominant" or `--volume 1.0` for a roughly equal mix.

Q: Can I concat audio files of different sample rates / codecs?

Not with `concat-audio` — it uses the `concat` *demuxer*, which requires identical codec + sample rate + channel layout for stream-copy. If your parts differ, you have two options: (1) re-encode each to a common format first (e.g. `npx fqmpeg sample-rate part1.mp3 48000 -o part1-48k.mp3` for each), or (2) use the `concat` *filter* directly in raw FFmpeg, which re-encodes everything: ```bash ffmpeg -i p1.mp3 -i p2.wav -i p3.aac -filter_complex "[0:a][1:a][2:a]concat=n=3:v=0:a=1[out]" -map "[out]" joined.mp3 ```

Q: Why does `audio-visualize --mode waves` always output green?

Because the underlying filter call hardcodes `colors=0x00FF00` (terminal green, matching the 32blog brand). The `--color` option doesn't exist on this verb. For a different color, take the `--dry-run` output, edit the hex value, and invoke FFmpeg directly. The static `waveform` command *does* expose `--color`, so for a one-off color choice, generate a still and use it as a poster image instead of a motion visualizer.

Q: How is `audio-visualize --mode spectrum` different from `spectrum`?

`audio-visualize --mode spectrum` produces an *animated video* (scrolling spectrogram, `showspectrum` filter, .mp4 output). `spectrum` produces a *single static image* (the whole track as one PNG, `showspectrumpic` filter, .png output). Use the animated one as a video backdrop for an audio upload; use the static one as cover art or for one-glance diagnosis.

Q: Does `multi-audio` re-encode anything?

No — every stream is `-c copy`. Both video and all audio tracks are stream-copied. This means the input audio files must already be in a format the output container (`.mkv` by default) accepts. AAC, MP3, Opus, Vorbis, and PCM all work. If you try to multiplex an Apple Lossless (ALAC) into a strict `.mp4`, it'll work (MP4 does support ALAC), but some browsers won't play it.

Q: Does `multi-audio` re-encode anything?

No — every stream is `-c copy`. Both video and all audio tracks are stream-copied. This means the input audio files must already be in a format the output container (`.mkv` by default) accepts. AAC, MP3, Opus, Vorbis, and PCM all work. If you try to multiplex an Apple Lossless (ALAC) into a strict `.mp4`, it'll work (MP4 does support ALAC), but some browsers won't play it.

fqmpeg's C11 cluster is the "plumbing and instruments" of the audio toolkit — fourteen verbs that move audio around (channel layout, multi-track assembly), reshape its data format (sample rate, bit depth), or turn audio into pictures (visualization). Compared with C9 (levels / EQ / dynamics) and C10 (creative effects), C11 changes the container and routing of audio, rarely its sonic content.

This guide walks each verb against its source in src/commands/ of fqmpeg 3.0.3 — the underlying FFmpeg filter or flag, the defaults, the output filename, and the gotchas that aren't visible from --help alone (stereo --mode 5.1 and surround are identical, pan-audio is linear-attenuation not equal-power, bit-depth only works on PCM-compatible containers, audio-visualize --mode waves paints in a fixed green).

What you'll get out of this guide

A decision matrix for the 14 verbs by task (channel layout / multi-track / format / visualization)
Exact FFmpeg invocation each verb generates (verified --dry-run output)
Defaults, units, output filenames — and what's hardcoded in the simplified surface
Three recipes — dual-mic podcast assembly, multilingual lecture upload, audio-only YouTube upload with waveform visual

The 14 Verbs at a Glance

The cluster splits into four task groups. Pick the group, then the verb.

Group	Verbs	What they do
Channel layout	`stereo`, `surround`, `extract-audio-channel`, `pan-audio`	Change channel count, isolate one side, position within stereo field
Multi-track assembly	`multi-audio`, `mix-audio`, `replace-audio`, `concat-audio`	Add additional tracks, mix in BGM, swap out audio, join clips
Format & quality	`sample-rate`, `bit-depth`	Resample to a target Hz, change PCM bit depth
Visualization	`audio-visualize`, `oscilloscope`, `waveform`, `spectrum`	Render animated video, overlay vectorscope, generate static PNG waveform / spectrogram

Five things to know before reading on:

stereo and surround are thin wrappers around -ac. They set the channel count (1, 2, 6) and rely on FFmpeg's default downmix/upmix matrix. There's no special LFE routing, no Dolby-aware mixing — surround is literally stereo --mode 5.1. If you need a credible stereo→5.1 upmix, use the surround FFmpeg filter directly (different thing — it's an envelope-following upmixer).
pan-audio is a linear pan, not equal-power. At --position 0 (center) both channels are at unity gain, so the summed mono amplitude is +6 dB louder than at the extremes. If you're crossfading between center-panned dialogue and a hard-panned source, expect a midpoint volume bump.
bit-depth works on PCM-compatible containers only. It forces pcm_s16le / pcm_s24le / pcm_s32le as the audio codec. For MP4 or MKV with AAC, this command won't apply — you'd typically use it on WAV files. The output extension is preserved from the input, so feed it a .wav.
audio-visualize and the static visualizers ship fixed colors / layouts. audio-visualize --mode waves paints in 0x00FF00 (terminal green); --mode spectrum is color=intensity. The static waveform defaults to 0x00FF00 too (but exposes --color). The static spectrum exposes 14 color schemes via --color. The oscilloscope overlay is a fixed avectorscope of size 320×320 placed at 70 % screen position.
multi-audio recommends .mkv because MP4 multi-audio is shaky. MP4 technically supports multiple audio streams, but many players (Safari, mobile browsers) only play the first. MKV is universally honest about track switching.

Channel Layout & Routing

`stereo` — Convert audio channels (mono ↔ stereo ↔ 5.1)

A thin wrapper around FFmpeg's -ac flag. Sets the channel count to 1, 2, or 6; the actual up/downmix matrix comes from FFmpeg's defaults.

Source: src/commands/stereo.js
Flag: -ac <1|2|6>
Output: <input-stem>-<mode>.<ext>

Argument / Option	Default	Allowed	Notes
`<input>`	required	—	Input video/audio
`--mode <mode>`	`stereo`	`mono`, `stereo`, `5.1`	Maps to `-ac 1`, `-ac 2`, `-ac 6`
`-o, --output <path>`	`<input-stem>-<mode>.<ext>`	—	Override output

bash

$ npx fqmpeg stereo input.mp4 --mode mono --dry-run

  ffmpeg -i input.mp4 -ac 1 -c:v copy input-mono.mp4

bash

$ npx fqmpeg stereo input.mp4 --mode 5.1 --dry-run

  ffmpeg -i input.mp4 -ac 6 -c:v copy input-5.1.mp4

What's hardcoded and why: the up/downmix matrix is not hardcoded by fqmpeg — it's FFmpeg's built-in pan-law matrix. For stereo → mono, FFmpeg sums L+R with equal gain (−3 dB each). For stereo → 5.1, it duplicates L into FL+BL, R into FR+BR, sums to FC, and synthesizes LFE from low-passed FC — a reasonable default but not a creative upmix. For credible stereo-to-5.1 widening (envelope-following ambience extraction), drop to FFmpeg's surround filter directly (the filter, not this verb).

When you outgrow it: for non-default upmix logic, write the pan matrix explicitly with the pan filter:

bash

ffmpeg -i stereo.wav -af "pan=5.1|FL=c0|FR=c1|FC=0.5*c0+0.5*c1|LFE=0.1*c0+0.1*c1|BL=c0|BR=c1" surround.wav

`surround` — Upmix stereo audio to 5.1 surround sound

A discoverability alias. Functionally identical to stereo --mode 5.1: both pass -ac 6 and rely on FFmpeg's default upmix matrix.

Source: src/commands/surround.js
Flag: -ac 6
Output: <input-stem>-surround.<ext>

Argument / Option	Default	Notes
`<input>`	required	Input video/audio
`-o, --output <path>`	`<input-stem>-surround.<ext>`	Override output

bash

$ npx fqmpeg surround input.mp4 --dry-run

  ffmpeg -i input.mp4 -ac 6 -c:v copy input-surround.mp4

Why two commands for the same thing: discoverability. Users searching fqmpeg --help for "surround" find this verb directly. Users thinking in terms of channel layout reach for stereo --mode 5.1. Both produce identical output.

`extract-audio-channel` — Extract a single audio channel

Pulls the left or right channel out of a stereo source and outputs it as mono. Useful when one mic was recorded onto one side (interview rigs, dual-mic field recorders).

Source: src/commands/extract-audio-channel.js
Filter: pan=mono|c0=FL (left) / pan=mono|c0=FR (right)
Output: <input-stem>-<channel>.<ext>

Argument	Required	Allowed	Notes
`<input>`	yes	—	Input video/audio
`<channel>`	yes	`left`, `right`	Which side to keep
`-o, --output <path>`	no	—	Override output

bash

$ npx fqmpeg extract-audio-channel interview.wav left --dry-run

  ffmpeg -i interview.wav -af pan=mono|c0=FL -c:v copy interview-left.wav

The output is mono (pan=mono|...), not stereo with the other side silent — picking left and right separately gives you two mono files of the same length.

`pan-audio` — Pan audio left / right

Repositions the source within the stereo field. -1.0 = full left, 1.0 = full right, 0 = center.

Source: src/commands/pan-audio.js
Filter: pan=stereo|c0=<L>*c0+0*c1|c1=0*c0+<R>*c1 where L = min(1, 1-p) and R = min(1, 1+p)
Output: <input-stem>-panned.<ext>

Argument / Option	Required	Range	Notes
`<input>`	yes	—	Input video/audio
`<position>`	yes	`-1.0` to `1.0`	`0` = center, `-1` = full left, `1` = full right
`-o, --output <path>`	no	—	Override output

bash

$ npx fqmpeg pan-audio input.mp4 0.5 --dry-run

  ffmpeg -i input.mp4 -af pan=stereo|c0=0.50*c0+0*c1|c1=0*c0+1.00*c1 -c:v copy input-panned.mp4

Pan law: linear attenuation, clipped at unity. At position 0, both channels are 1.0 — the summed mono is +6 dB louder than at the extremes (position ±1, one side at 1.0 and the other at 0). This is not equal-power panning, which would attenuate each channel by 3 dB at center to keep mono summation constant. If you're crossfading between a center-panned dialogue and a hard-panned source, you'll hear a midpoint bump.

When you need equal-power: write the pan matrix yourself with sqrt-weighted gains:

bash

# Equal-power pan at position 0.5 (slightly right): L = sqrt((1-p)/2), R = sqrt((1+p)/2)
ffmpeg -i in.wav -af "pan=stereo|c0=0.500*c0|c1=0.866*c1" out.wav

Multi-Track Assembly

`multi-audio` — Add multiple audio tracks to a video

Attaches one or more audio files as additional audio streams in the output (alongside any existing tracks). Useful for multilingual exports — viewers switch audio in the player. Output defaults to .mkv because MP4 multi-audio playback is unreliable across players.

Source: src/commands/multi-audio.js
Flags: -map 0:v -map 0:a? -map 1:a -map 2:a ... -c copy
Output: <video-stem>-multi-audio.mkv (forced .mkv extension)

Argument / Option	Required	Notes
`<video>`	yes	Input video file
`<audios...>`	yes (≥1)	Additional audio files
`-o, --output <path>`	no	Override output (`.mkv` recommended)

bash

$ npx fqmpeg multi-audio video.mp4 jp.aac es.aac --dry-run

  ffmpeg -i video.mp4 -i jp.aac -i es.aac -map 0:v -map 0:a? -map 1:a -map 2:a -c copy video-multi-audio.mkv

-c copy everywhere: no re-encoding of video or audio. All inputs must already be in a format the output container (.mkv by default) accepts. The -map 0:a? uses an optional flag (?), so if the source video has no audio, the command still works.

The .mkv choice: Matroska supports unlimited audio streams cleanly, and players (VLC, mpv, modern browsers via <video> + track switching) handle it. MP4 nominally supports multi-audio but iOS Safari and some Android browsers ignore non-primary tracks.

`mix-audio` — Mix a secondary audio track (BGM, narration)

Blends a second audio file in alongside the original. Default volume for the mix-in is 30 % — quiet enough that it works as background music without overwhelming dialogue.

Source: src/commands/mix-audio.js
Filter: [1:a]volume=<v>[bgm];[0:a][bgm]amix=inputs=2:duration=first
Output: <input-stem>-mixed.<ext>

Argument / Option	Default	Range	Notes
`<input>`	required	—	Input video
`<audio>`	required	—	Audio to mix in
`--volume <level>`	`0.3`	`0.0` to `1.0`	Volume of the mixed-in audio
`--shortest`	off	flag	End output when the shorter stream ends
`-o, --output <path>`	`<input-stem>-mixed.<ext>`	—	Override output

bash

$ npx fqmpeg mix-audio video.mp4 bgm.mp3 --dry-run

  ffmpeg -i video.mp4 -i bgm.mp3 -filter_complex [1:a]volume=0.3[bgm];[0:a][bgm]amix=inputs=2:duration=first -map 0:v -c:v copy video-mixed.mp4

duration=first: the output ends when the first (original) audio ends. If your BGM is longer than the video, it's truncated. If shorter, it stops mid-track and the rest of the video plays with only the original audio. Use --shortest to also cap by the BGM length if you want a tight loop instead.

amix normalization quirk: FFmpeg's amix filter divides the sum by the input count by default, so mixing 2 streams attenuates each by 50 % before scaling. fqmpeg's --volume 0.3 is applied to the BGM before amix, so the effective BGM level in the output is roughly 0.3 / 2 = 0.15 (−16 dB). If the BGM feels too quiet, try --volume 0.6 rather than expecting 0.3 to mean −10 dB.

`replace-audio` — Replace the audio track entirely

Discards the original audio, swaps in a new audio file. Stream-copy: no re-encoding, so the new audio must already be in a format the output container accepts.

Source: src/commands/replace-audio.js
Flags: -map 0:v -map 1:a -c:v copy
Output: <input-stem>-newaudio.<ext>

Argument / Option	Default	Notes
`<input>`	required	Input video
`<audio>`	required	New audio file
`--shortest`	off	End at the shorter of (video, new audio)
`-o, --output <path>`	`<input-stem>-newaudio.<ext>`	Override output

bash

$ npx fqmpeg replace-audio video.mp4 voiceover.aac --dry-run

  ffmpeg -i video.mp4 -i voiceover.aac -map 0:v -map 1:a -c:v copy video-newaudio.mp4

The new audio is stream-copied (no -c:a flag overrides this, so FFmpeg uses copy by default when no filter graph touches the stream). If your new audio is a .wav and the container is .mp4, FFmpeg may refuse (PCM in MP4 is non-standard). Convert the audio to AAC first, or change the output extension.

`concat-audio` — Concatenate multiple audio files

Joins two or more audio files end-to-end. Uses FFmpeg's concat demuxer (the listfile-based approach), which requires all inputs to share the same codec, sample rate, and channel layout (otherwise stream-copy fails). fqmpeg auto-generates the listfile in the directory of the first input, runs the concat, and cleans up on exit.

Source: src/commands/concat-audio.js
Flags: -f concat -safe 0 -i <listfile> -c copy
Output: <first-input-stem>-joined.<first-input-ext>

Argument / Option	Required	Notes
`<inputs...>`	yes (≥2)	Two or more audio files in order
`-o, --output <path>`	no	Override output

bash

$ npx fqmpeg concat-audio part1.mp3 part2.mp3 part3.mp3 --dry-run

  # File list (auto-generated):
  # file '/abs/path/part1.mp3'
  # file '/abs/path/part2.mp3'
  # file '/abs/path/part3.mp3'

  ffmpeg -f concat -safe 0 -i filelist.txt -c copy part1-joined.mp3

concat demuxer requirements: all inputs must have identical codec + sample rate + channel layout. If part1 is 48 kHz stereo MP3 and part2 is 44.1 kHz mono MP3, stream-copy concat fails. Either re-encode the parts to a common format first, or use the concat filter (not demuxer) which re-encodes:

bash

ffmpeg -i p1.mp3 -i p2.mp3 -filter_complex "[0:a][1:a]concat=n=2:v=0:a=1[out]" -map "[out]" joined.mp3

Absolute paths in the listfile: fqmpeg resolves each input to an absolute path before writing the listfile, so the concat works regardless of where the listfile sits. The listfile is created in the directory of the first input with a timestamped filename like .fqmpeg-concat-audio-1730000000000.txt and deleted on exit.

Format & Quality

`sample-rate` — Change audio sample rate

Resamples to a target rate via -ar. FFmpeg's default resampler (soxr in modern builds, aresample otherwise) handles up- and down-sampling.

Source: src/commands/sample-rate.js
Flag: -ar <rate>
Output: <input-stem>-<rate>hz.<ext> (e.g. input-48000hz.mp4)

Argument / Option	Required	Notes
`<input>`	yes	Input video/audio
`<rate>`	yes (positive integer in Hz)	Common: `44100` (CD), `48000` (video/broadcast), `96000` (mastering)
`-o, --output <path>`	no	Override output

bash

$ npx fqmpeg sample-rate input.wav 48000 --dry-run

  ffmpeg -i input.wav -ar 48000 -c:v copy input-48000hz.wav

Why no enum: the description suggests 44100, 48000, 96000, but the validator only requires positive integer. Niche rates (22050 for retro phone audio, 192000 for high-res masters) work too. Anything FFmpeg's swresample accepts.

Re-encoding is required. Stream-copy can't change sample rate. fqmpeg passes no -c:a flag, so FFmpeg picks the default codec for the output container (AAC for MP4, PCM for WAV, etc.) and re-encodes. To force a specific codec, drop to raw FFmpeg.

`bit-depth` — Change audio bit depth (PCM)

Forces the audio codec to PCM at 16, 24, or 32 bit depth. Practically this means the output must be a container that supports PCM (WAV, MKV, AIFF) — not MP4 with AAC.

Source: src/commands/bit-depth.js
Flag: -c:a pcm_s16le / pcm_s24le / pcm_s32le
Output: <input-stem>-<bits>bit.<ext>

Argument	Required	Allowed	Notes
`<input>`	yes	—	Input video/audio (typically WAV)
`<bits>`	yes	`16`, `24`, `32`	Target bit depth
`-o, --output <path>`	no	—	Override output

bash

$ npx fqmpeg bit-depth master.wav 24 --dry-run

  ffmpeg -i master.wav -c:a pcm_s24le -c:v copy master-24bit.wav

Container compatibility: .wav accepts all three depths. .mkv accepts them too (PCM in Matroska is valid). .mp4 will fail — pcm_s24le is not a valid codec for ISO BMFF (MP4). If you feed in an .mp4, change the output extension to .mkv or .wav via -o.

24-bit internal: pcm_s24le stores 24-bit samples in 32-bit containers (little-endian, with 8 padding bits). The output file size scales accordingly — 24-bit files are 1.5× the size of 16-bit, not 2× (and 32-bit files are 2× exactly).

Bit depth ≠ audio resolution. A 24-bit file recorded from a 16-bit source doesn't contain more detail — the extra 8 bits are zeroed. Useful only when the source actually has higher dynamic range (multi-mic field recording, mastering masters).

Visualization

`audio-visualize` — Animated audio visualization video

Renders audio as a video stream with one of three live-updating visualizers: scrolling waveform (waves), scrolling spectrogram (spectrum), or histogram (histogram). Output is .mp4 (H.264 + yuv420p) regardless of input extension.

Source: src/commands/audio-visualize.js
Filter (waves): showwaves=s=<W>x<H>:mode=cline:rate=30:colors=0x00FF00
Filter (spectrum): showspectrum=s=<W>x<H>:mode=combined:color=intensity:slide=scroll
Filter (histogram): ahistogram=s=<W>x<H>:rheight=0.5
Output: <input-stem>-visualize.mp4 (forced .mp4)

Argument / Option	Default	Allowed	Notes
`<input>`	required	—	Input audio (video also works — only audio stream is used)
`--mode <mode>`	`waves`	`waves`, `spectrum`, `histogram`	Visualization style
`--size <WxH>`	`1920x1080`	resolution string	Output video size
`-o, --output <path>`	`<input-stem>-visualize.mp4`	—	Override output

bash

$ npx fqmpeg audio-visualize song.mp3 --dry-run

  ffmpeg -i song.mp3 -filter_complex showwaves=s=1920x1080:mode=cline:rate=30:colors=0x00FF00 -c:v libx264 -pix_fmt yuv420p song-visualize.mp4

bash

$ npx fqmpeg audio-visualize song.mp3 --mode spectrum --dry-run

  ffmpeg -i song.mp3 -filter_complex showspectrum=s=1920x1080:mode=combined:color=intensity:slide=scroll -c:v libx264 -pix_fmt yuv420p song-visualize.mp4

What's hardcoded:

waves: color 0x00FF00 (terminal green), drawing mode cline (centered line, scrolls), refresh rate 30 fps
spectrum: mode combined (one band per stereo pair), color scheme intensity, scroll mode slide=scroll
histogram: rheight=0.5 (relative bar height)
All three: video codec libx264 + yuv420p for universal playback

The choice of 0x00FF00 for waves matches the 32blog brand color, but more importantly it reads well against a black background (which showwaves provides by default). If you want a different color, drop to raw FFmpeg:

bash

ffmpeg -i song.mp3 -filter_complex "showwaves=s=1920x1080:mode=cline:rate=30:colors=0xFF6600" \
  -c:v libx264 -pix_fmt yuv420p song-orange.mp4

When you outgrow it: for stylized visualizers (3D bars, Spotify-style equalizer with peak hold, particle systems), FFmpeg alone won't cut it. Look at tools like Specterr or butterchurn (MilkDrop in the browser) for music videos. fqmpeg's visualizers are functional but plain.

`oscilloscope` — Oscilloscope overlay on existing video

Overlays an avectorscope (a circular phase-correlation display) onto an existing video. Useful for music videos and electronic music demos where seeing the stereo image is part of the aesthetic.

Source: src/commands/oscilloscope.js
Filter: avectorscope=s=320x320:zoom=1.5:draw=line,format=yuva420p[osc];[0:v][osc]overlay=W*0.7:H*0.7
Output: <input-stem>-oscilloscope.<ext>

Argument / Option	Default	Notes
`<input>`	required	Input video with audio
`-o, --output <path>`	`<input-stem>-oscilloscope.<ext>`	Override output

bash

$ npx fqmpeg oscilloscope musicvideo.mp4 --dry-run

  ffmpeg -i musicvideo.mp4 -filter_complex avectorscope=s=320x320:zoom=1.5:draw=line,format=yuva420p[osc];[0:v][osc]overlay=W*0.7:H*0.7 -c:a copy musicvideo-oscilloscope.mp4

avectorscope reads stereo correlation. The dot pattern shows how the left and right channels relate moment-to-moment: a vertical line = mono (perfectly correlated), a horizontal line = out-of-phase (mono-summing will cancel), a circular cloud = wide stereo. For dual-mono dialogue, you'll see a single vertical line. For a wide synth pad, you'll see a noisy ellipse.

Audio is preserved (-c:a copy), video is re-encoded because of the filter graph.

When you outgrow it: the underlying avectorscope filter has many more parameters (zoom, m=peak/mode=binary, rc/gc/bc for color, t=draw/line/dot). To customize beyond what fqmpeg exposes, copy the --dry-run filter and edit:

bash

# Color the trace, lower zoom, draw filled
ffmpeg -i input.mp4 -filter_complex "avectorscope=s=480x480:zoom=1.0:draw=dot:rc=255:gc=128:bc=0,format=yuva420p[osc];[0:v][osc]overlay=W*0.5-240:H*0.5-240" -c:a copy out.mp4

`waveform` — Static waveform image

Renders the entire audio track as a single PNG waveform — the "audiogram" look common on podcast covers and audio-to-video transcoded clips.

Source: src/commands/waveform.js
Filter: aformat=channel_layouts=mono,showwavespic=s=<W>x<H>:colors=<hex>
Output: <input-stem>-waveform.png (forced .png)

Argument / Option	Default	Notes
`<input>`	required	Input audio/video
`--size <WxH>`	`1920x200`	Output image size — wider for podcast covers, taller for vinyl-style displays
`--color <hex>`	`0x00FF00`	Hex color (any valid FFmpeg color spec — `0xRRGGBB` or `Color@Alpha`)
`-o, --output <path>`	`<input-stem>-waveform.png`	—

bash

$ npx fqmpeg waveform song.mp3 --dry-run

  ffmpeg -i song.mp3 -filter_complex aformat=channel_layouts=mono,showwavespic=s=1920x200:colors=0x00FF00 -frames:v 1 song-waveform.png

Mono summation first: aformat=channel_layouts=mono downmixes the audio before rendering. This is intentional — a stereo waveform image with two stacked traces is harder to read than a single combined trace, and the static PNG can't show stereo movement anyway. If you want a two-channel display, drop the aformat step and adjust showwavespic settings:

bash

ffmpeg -i song.mp3 -filter_complex "showwavespic=s=1920x400:colors=0x00FF00|0xFF6600:split_channels=1" -frames:v 1 song-stereo.png

Single PNG via -frames:v 1: showwavespic produces one frame containing the full track waveform. The flag tells FFmpeg to write a single image instead of a sequence.

`spectrum` — Static spectrogram image

Renders a frequency-vs-time spectrogram of the audio as a single PNG. Use it to find unwanted hum (60 Hz horizontal line), high-frequency hiss (energy above 8 kHz), or visualize the spectral signature of a track for cover art.

Source: src/commands/spectrum.js
Filter: showspectrumpic=s=<W>x<H>:color=<scheme>
Output: <input-stem>-spectrum.png (forced .png)

Argument / Option	Default	Allowed	Notes
`<input>`	required	—	Input audio/video
`--size <WxH>`	`1920x512`	resolution string	Spectrum area size; the default legend adds margins, so the final PNG is larger (~2200×640 for the default `1920x512`)
`--color <mode>`	`intensity`	`intensity`, `rainbow`, `moreland`, `nebulae`, `fire`, `fiery`, `fruit`, `cool`, `magma`, `green`, `viridis`, `plasma`, `cividis`, `terrain`	Color scheme
`-o, --output <path>`	`<input-stem>-spectrum.png`	—	—

bash

$ npx fqmpeg spectrum song.mp3 --dry-run

  ffmpeg -i song.mp3 -filter_complex showspectrumpic=s=1920x512:color=intensity -frames:v 1 song-spectrum.png

bash

$ npx fqmpeg spectrum song.mp3 --color viridis --dry-run

  ffmpeg -i song.mp3 -filter_complex showspectrumpic=s=1920x512:color=viridis -frames:v 1 song-spectrum.png

Reading a spectrogram: time runs horizontally (left to right), frequency runs vertically (low at bottom, high at top, log-scaled), color = energy. A clean speech track shows energy concentrated 100 Hz–4 kHz with gaps for breath; a kick drum is a vertical streak at 60–100 Hz; an unwanted ground-loop hum is a perfectly horizontal line at 50 or 60 Hz. For mastering work, the spectrogram is faster diagnostic than EQ-with-spectrum-analyzer.

showspectrumpic vs showspectrum: the pic suffix produces a static image of the entire file. The non-pic version produces an animated video (which is what audio-visualize --mode spectrum uses). Don't confuse them.

Real-World Recipes

Recipe 1: Dual-mic podcast — split tracks → normalize → join

A common Zoom-style podcast setup records both speakers onto one stereo file: host on left, guest on right. To master them individually (different gain, different EQ) and reassemble:

bash

# Step 1: extract each side
npx fqmpeg extract-audio-channel raw.wav left -o host.wav
npx fqmpeg extract-audio-channel raw.wav right -o guest.wav

# Step 2: normalize each (C9 verb)
npx fqmpeg normalize host.wav -o host-n.wav
npx fqmpeg normalize guest.wav -o guest-n.wav

# Step 3: mix back into a stereo file with each on their original side
ffmpeg -i host-n.wav -i guest-n.wav \
  -filter_complex "[0:a]apad,pan=stereo|c0=c0|c1=0[hL];[1:a]apad,pan=stereo|c0=0|c1=c0[gR];[hL][gR]amerge=inputs=2,pan=stereo|c0=c0+c2|c1=c1+c3" \
  podcast-final.wav

The last step uses raw FFmpeg because fqmpeg doesn't have a "stereo-from-two-monos" verb. The pan chain places host on left, guest on right, then amerge combines, then a final pan collapses back to true stereo.

Recipe 2: Multilingual lecture — one video, three audio tracks

You have a 50-minute lecture video and three voiceover dubs (Japanese, English, Spanish). Want a single MKV with all three audio streams for YouTube / school CMS uploads:

bash

# All voiceovers must already be the same length as the video.
# (If they're not, trim or pad with fqmpeg trim first.)
npx fqmpeg multi-audio lecture.mp4 lecture-ja.aac lecture-en.aac lecture-es.aac
# → lecture-multi-audio.mkv

The output is an MKV with the original video, the original audio track (if any), and three additional audio streams. Players that support track selection (VLC, mpv, modern HTML5 video) let viewers switch languages without separate downloads.

Recipe 3: Audio-only YouTube upload with waveform background

For an interview podcast you want to upload to YouTube. YouTube needs video, but you only have audio. Generate an audio-reactive waveform video as the visual:

bash

# Step 1: render an animated waveform video (full duration)
npx fqmpeg audio-visualize interview.mp3 --size 1920x1080 -o interview-visual.mp4

# Step 2: swap the (silent) visualizer's audio for the original
npx fqmpeg replace-audio interview-visual.mp4 interview.mp3 -o interview-final.mp4

audio-visualize produces a video with the audio embedded too, but if you wanted to layer a different track (e.g. the visual on top of a normalized version, or the original with a different intro), replace-audio swaps cleanly.

For a thumbnail of the same interview, use spectrum to generate a one-shot spectrogram PNG:

bash

npx fqmpeg spectrum interview.mp3 --color viridis --size 1920x1080 -o interview-cover.png

Frequently Asked Questions

Are `stereo --mode 5.1` and `surround` actually identical?

Yes — both run ffmpeg -i input -ac 6 -c:v copy output. The only difference is the output filename suffix (-5.1 vs -surround) and the command name. fqmpeg exposes two because users discover them through different mental models — "I want to change channel count" lands on stereo; "I want 5.1 surround" lands on surround.

Why does `pan-audio --position 0` sound louder than `--position 1`?

Because the pan law is linear-attenuation with unity clipping, not equal-power. At position 0, both channels are at 1.0 gain. At position 1 (full right), left = 0 and right = 1. Summed to mono, position 0 is L+R = 2.0 = +6 dB, while position 1 is just R = 1.0. To get equal-power panning where mono summation stays constant, write the pan filter yourself with sqrt-weighted gains: c0=sqrt(1-p)*c0|c1=sqrt(1+p)*c1 (where p is in 0–1).

`bit-depth 24` fails with my MP4 file — why?

Because MP4 (ISO BMFF) doesn't accept pcm_s24le as an audio codec. Bit-depth changes only work in containers that support PCM: .wav, .mkv, .aiff, .flac. Either convert your MP4 audio to WAV first (ffmpeg -i in.mp4 -vn -c:a pcm_s16le audio.wav), run bit-depth on the WAV, then re-mux back. Or change the output extension to .mkv via -o video-24bit.mkv.

Why is the BGM so quiet at `mix-audio --volume 0.3`?

Because FFmpeg's amix filter divides its output by the input count (to prevent clipping). With 2 inputs, each is halved before summing, so your BGM at volume=0.3 ends up at effectively 15 % of original level (0.3 / 2 = 0.15, about −16 dB). The original audio is also halved, then summed. If the BGM feels too quiet relative to dialogue, try --volume 0.6 for "noticeable but not dominant" or --volume 1.0 for a roughly equal mix.

Can I concat audio files of different sample rates / codecs?

Not with concat-audio — it uses the concat demuxer, which requires identical codec + sample rate + channel layout for stream-copy. If your parts differ, you have two options: (1) re-encode each to a common format first (e.g. npx fqmpeg sample-rate part1.mp3 48000 -o part1-48k.mp3 for each), or (2) use the concat filter directly in raw FFmpeg, which re-encodes everything:

bash

ffmpeg -i p1.mp3 -i p2.wav -i p3.aac -filter_complex "[0:a][1:a][2:a]concat=n=3:v=0:a=1[out]" -map "[out]" joined.mp3

Why does `audio-visualize --mode waves` always output green?

Because the underlying filter call hardcodes colors=0x00FF00 (terminal green, matching the 32blog brand). The --color option doesn't exist on this verb. For a different color, take the --dry-run output, edit the hex value, and invoke FFmpeg directly. The static waveform command does expose --color, so for a one-off color choice, generate a still and use it as a poster image instead of a motion visualizer.

How is `audio-visualize --mode spectrum` different from `spectrum`?

audio-visualize --mode spectrum produces an animated video (scrolling spectrogram, showspectrum filter, .mp4 output). spectrum produces a single static image (the whole track as one PNG, showspectrumpic filter, .png output). Use the animated one as a video backdrop for an audio upload; use the static one as cover art or for one-glance diagnosis.

Does `multi-audio` re-encode anything?

No — every stream is -c copy. Both video and all audio tracks are stream-copied. This means the input audio files must already be in a format the output container (.mkv by default) accepts. AAC, MP3, Opus, Vorbis, and PCM all work. If you try to multiplex an Apple Lossless (ALAC) into a strict .mp4, it'll work (MP4 does support ALAC), but some browsers won't play it.

Wrapping Up

The fourteen C11 verbs sit between "you have an audio track" and "it's the right shape for the deliverable":

stereo, surround, extract-audio-channel, pan-audio for channel layout (the first two are thin -ac wrappers; pan-audio uses linear-attenuation, not equal-power)
multi-audio, mix-audio, replace-audio, concat-audio for multi-track assembly (mix's amix halves each input — bump --volume if BGM feels quiet; concat requires identical codec/SR/layout for stream-copy)
sample-rate, bit-depth for format conversion (bit-depth is PCM-only — feed it WAV or MKV, not MP4)
audio-visualize, oscilloscope, waveform, spectrum for visualization (animated visualizers paint in fixed colors; static images are the "audiogram" cover-art workflow)

Every verb prints its underlying FFmpeg invocation under --dry-run, so when the simplified surface isn't enough (custom pan laws, multi-codec concat, two-channel spectrograms), copy the filter, edit, and call FFmpeg directly. For the broader fqmpeg map, see the fqmpeg complete guide.

The 14 Verbs at a Glance

Channel Layout & Routing

stereo — Convert audio channels (mono ↔ stereo ↔ 5.1)

surround — Upmix stereo audio to 5.1 surround sound

extract-audio-channel — Extract a single audio channel

pan-audio — Pan audio left / right

Multi-Track Assembly

multi-audio — Add multiple audio tracks to a video

mix-audio — Mix a secondary audio track (BGM, narration)

replace-audio — Replace the audio track entirely

concat-audio — Concatenate multiple audio files

Format & Quality

sample-rate — Change audio sample rate

bit-depth — Change audio bit depth (PCM)

Visualization

audio-visualize — Animated audio visualization video

oscilloscope — Oscilloscope overlay on existing video

waveform — Static waveform image

spectrum — Static spectrogram image

Real-World Recipes

Recipe 1: Dual-mic podcast — split tracks → normalize → join

Recipe 2: Multilingual lecture — one video, three audio tracks

Recipe 3: Audio-only YouTube upload with waveform background

Frequently Asked Questions

Are stereo --mode 5.1 and surround actually identical?

Why does pan-audio --position 0 sound louder than --position 1?

bit-depth 24 fails with my MP4 file — why?

Why is the BGM so quiet at mix-audio --volume 0.3?

Can I concat audio files of different sample rates / codecs?

Why does audio-visualize --mode waves always output green?

How is audio-visualize --mode spectrum different from spectrum?

Does multi-audio re-encode anything?

Wrapping Up

`stereo` — Convert audio channels (mono ↔ stereo ↔ 5.1)

`surround` — Upmix stereo audio to 5.1 surround sound

`extract-audio-channel` — Extract a single audio channel

`pan-audio` — Pan audio left / right

`multi-audio` — Add multiple audio tracks to a video

`mix-audio` — Mix a secondary audio track (BGM, narration)

`replace-audio` — Replace the audio track entirely

`concat-audio` — Concatenate multiple audio files

`sample-rate` — Change audio sample rate

`bit-depth` — Change audio bit depth (PCM)

`audio-visualize` — Animated audio visualization video

`oscilloscope` — Oscilloscope overlay on existing video

`waveform` — Static waveform image

`spectrum` — Static spectrogram image

Are `stereo --mode 5.1` and `surround` actually identical?

Why does `pan-audio --position 0` sound louder than `--position 1`?

`bit-depth 24` fails with my MP4 file — why?

Why is the BGM so quiet at `mix-audio --volume 0.3`?

Why does `audio-visualize --mode waves` always output green?

How is `audio-visualize --mode spectrum` different from `spectrum`?

Does `multi-audio` re-encode anything?