fqmpeg's C12 cluster covers the back-and-forth between video and individual frames — single thumbnails, periodic snapshots, contact sheets, filmstrips, scene-detection splits, slideshows from stills, side-by-side comparisons, and counting frames with ffprobe. Twelve verbs total, all working through ffmpeg (or ffprobe in one case) but exposing far simpler arguments than the underlying filter chains.
This guide walks each verb against its source in src/commands/ of fqmpeg 3.0.3 — the underlying FFmpeg filter or flag, the defaults, the output filename, and the gotchas that aren't visible from --help alone (thumbnail-grid and tile both produce contact sheets but sample differently; snapshot and video-to-frames both extract periodic stills but to different default directories; frames-to-video and slideshow both build video from images but use entirely different mechanisms).
What you'll get out of this guide
- A decision matrix for the 12 verbs by task (single still / multiple stills / contact sheets / video reconstruction / analysis)
- Exact FFmpeg invocation each verb generates (verified
--dry-runoutput) - Defaults, units, output filenames — and which verbs overlap (
thumbnail-gridvstile,snapshotvsvideo-to-frames) - Three recipes — YouTube thumbnail workflow, time-lapse from a security camera dump, before/after filter comparison
The 12 Verbs at a Glance
The cluster splits into four task groups. Pick the group, then the verb.
| Group | Verbs | What they do |
|---|---|---|
| Single & periodic stills | thumbnail, snapshot, video-to-frames, count-frames | Pull one image at a timestamp, or many at intervals / per frame |
| Contact sheets & filmstrips | thumbnail-grid, thumbnail-strip, tile | Composite multiple frames into a single image (grid or row) |
| Frames ↔ video reconstruction | frames-to-video, slideshow | Build video from a numbered pattern or a list of images |
| Analysis & composition | scenes, preview, compare | Detect cuts, generate highlight reels, side-by-side before/after |
Five things to know before reading on:
thumbnail-gridandtilebuild contact sheets with the same time-based sampling. Both useselect='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)'— one frame per second — which behaves predictably on long and short videos alike. The two verbs are kept as aliases for discoverability (search for "grid" or "tile" and either lands here); the only difference is the default output filename (-grid.jpgvs-tile<C>x<R>.jpg).snapshotoutputs alongside the input;video-to-framesoutputs to the current working directory. Both extract periodic stills, but their default output patterns differ —snapshotputs files in the same folder as the input (<input-dir>/<stem>-snap-%04d.jpg), whilevideo-to-frameswrites to./frame_%04d.pngfromcwd. If you want predictable paths, always pass-o.count-framesis the only verb in C12 that usesffprobe, notffmpeg. It runsffprobe -count_frames -select_streams v:0and prints a single integer to stdout. No file is written.frames-to-videouses-framerate(input rate), not-r(output rate). This matters because-frameratetells FFmpeg how to interpret the still sequence (how many to feed per second), while-rwould re-time the output stream. For straight image-to-video at a consistent rate,-framerateis the correct flag and what fqmpeg uses.scenessplits a video at detected cuts using the segmenter. Each scene becomes its own file (<stem>-scene000.mp4,<stem>-scene001.mp4, ...) — useful for breaking long recordings into clips automatically. Threshold is0.0–1.0; lower = more sensitive = more cuts.
Single & Periodic Stills
thumbnail — One frame as a single image
The simplest verb in the cluster: seek to a timestamp and write one JPEG. Used for video thumbnails (YouTube uploads, gallery covers, OG image tags).
- Source:
src/commands/thumbnail.js - Flags:
-ss <sec> -i <input> -frames:v 1 -q:v 2 - Output:
<input-stem>-thumb.jpg
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video file |
-s, --start <sec> | 1 | Timestamp in seconds (decimal allowed) |
-o, --output <path> | <input-stem>-thumb.jpg | Override output |
$ npx fqmpeg thumbnail input.mp4 --dry-run
ffmpeg -ss 1 -i input.mp4 -frames:v 1 -q:v 2 input-thumb.jpg
$ npx fqmpeg thumbnail input.mp4 -s 45.5 -o cover.jpg --dry-run
ffmpeg -ss 45.5 -i input.mp4 -frames:v 1 -q:v 2 cover.jpg
-ss before -i is the fast-seek form. FFmpeg uses keyframe-aligned seeking when -ss precedes -i, which is dramatically faster on long files but lands on the nearest keyframe rather than the exact requested timestamp. For a content thumbnail, this is fine — keyframes are usually a half-second apart at most. For frame-accurate extraction, you'd put -ss after -i, but that decodes from the start and is much slower (and fqmpeg doesn't expose that mode here).
-q:v 2: JPEG quality scale where 2 is near-lossless (the scale is 1–31, lower is better). This is the highest reasonable JPEG quality for a thumbnail — large file, sharp output. For smaller thumbnails (gallery tiles), generate at full quality here and downscale separately with resize.
snapshot — Frames at regular intervals
Extracts one frame every N seconds across the entire video. The output is a numbered sequence (<stem>-snap-0001.jpg, 0002.jpg, ...), saved alongside the input file.
- Source:
src/commands/snapshot.js - Filter:
fps=<1/interval> - Output:
<input-dir>/<stem>-snap-%04d.<format>
| Argument / Option | Default | Allowed | Notes |
|---|---|---|---|
<input> | required | — | Input video file |
--interval <seconds> | 1 | positive number | Capture interval (so --interval 5 = one frame per 5 s) |
--format <fmt> | jpg | jpg, png | Output image format |
-o, --output <pattern> | <input-dir>/<stem>-snap-%04d.<format> | printf pattern | Override; must contain %d or %0Nd |
$ npx fqmpeg snapshot lecture.mp4 --dry-run
ffmpeg -i lecture.mp4 -vf fps=1 -q:v 2 lecture-snap-%04d.jpg
$ npx fqmpeg snapshot lecture.mp4 --interval 30 --format png --dry-run
ffmpeg -i lecture.mp4 -vf fps=0.03333333333333333 -q:v 2 lecture-snap-%04d.png
Why fps= instead of select=: the fps filter is the simplest way to enforce a constant output rate — FFmpeg drops or duplicates frames as needed to match the target. fps=0.0333... means "one frame every 30 seconds", and FFmpeg picks whichever frame is closest to each 30-second tick.
-q:v 2 is applied to PNG too, but ignored — PNG quality is determined by compression level, not q scale. JPEG is where it matters.
Counting outputs: for a 60-minute video at --interval 30, you'll get 120 files (-snap-0001.jpg through -snap-0120.jpg). For --interval 1 on the same file, you'd get 3600 files. Plan disk space accordingly.
video-to-frames — Every frame (or fps-throttled) as images
Extracts frames at the source frame rate by default, or at a throttled rate if --fps is supplied. Use this when you need every frame (frame-by-frame retouching, ML training datasets) or a known per-second sampling rate.
- Source:
src/commands/video-to-frames.js - Flag:
-i <input>(+ optional-vf fps=<n>) - Output:
./frame_%04d.<format>(in current working directory)
| Argument / Option | Default | Allowed | Notes |
|---|---|---|---|
<input> | required | — | Input video |
--fps <n> | source rate (no filter) | positive number | Throttle to this many frames per second |
--format <fmt> | png | png, jpg | Image format. PNG default (lossless) for ML / editing |
-o, --output <pattern> | ./frame_%04d.<format> | printf pattern | Override |
$ npx fqmpeg video-to-frames input.mp4 --dry-run
ffmpeg -i input.mp4 frame_%04d.png
$ npx fqmpeg video-to-frames input.mp4 --fps 5 --format jpg --dry-run
ffmpeg -i input.mp4 -vf fps=5 frame_%04d.jpg
Default is every frame. At 30 fps for 60 s, that's 1800 PNGs — easily several hundred megabytes. If you just want periodic samples, use snapshot (which has a saner default) or pass --fps.
Output goes to CWD, not the input directory. This is intentional — frame dumps are often scratch data that you want in a known working folder, not polluting the directory the video lives in. But it's a behavioral inconsistency with snapshot, so if you want them next to the input, pass -o input/dir/frame_%04d.png.
PNG vs JPG: PNG is the safe default for downstream image processing (lossless, alpha-preserving). JPG is 5–10× smaller but lossy — fine for previewing or for ML training where the model will downsample anyway. The format flag drives the extension; FFmpeg picks the codec from that.
count-frames — Total frame count via ffprobe
Returns the exact number of video frames in the stream. The only C12 verb that doesn't use ffmpeg — it runs ffprobe -count_frames and writes a single integer to stdout, no output file.
- Source:
src/commands/count-frames.js - Binary:
ffprobe(notffmpeg) - Output: integer to stdout
| Argument / Option | Notes |
|---|---|
<input> | Input video |
$ npx fqmpeg count-frames input.mp4 --dry-run
ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 input.mp4
$ npx fqmpeg count-frames input.mp4
1798
-count_frames decodes the entire stream. It's accurate but slow — for a long file, count-frames reads every byte of the video. For a faster estimate, query duration × frame rate from the header (no decoding required):
ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4
That returns duration and r_frame_rate (e.g. 60.5,30000/1001), and you multiply (60.5 × 29.97 ≈ 1814 frames) — close but not exact for variable-frame-rate files.
Why exact counts matter: for video editing automation (split into N equal chunks of frames), pose-estimation workflows that need per-frame indexing, or anti-cheat / forensic analysis where the exact frame count is a fingerprint. For most needs, the header-derived estimate is fine.
Contact Sheets & Filmstrips
thumbnail-grid — Contact sheet (alias of tile)
A multi-frame thumbnail composite arranged as a grid. The classic "DVD chapter selection" look. Uses the same time-based sampling as tile (one frame per second), kept under this name for users who search for "grid".
- Source:
src/commands/thumbnail-grid.js - Filter:
select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=<W>:-1, tile=<C>x<R> - Output:
<input-stem>-grid.jpg
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video |
--cols <n> | 4 | Grid columns |
--rows <n> | 4 | Grid rows |
--width <n> | 320 | Width of each tile in pixels |
-o, --output <path> | <input-stem>-grid.jpg | Override |
$ npx fqmpeg thumbnail-grid input.mp4 --dry-run
ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=320:-1,tile=4x4 -q:v 2 input-grid.jpg
One frame per second, capped at cols × rows. With the default 4×4 = 16 tiles, the output covers the first 16 seconds of the video. For longer videos, increase --cols/--rows so cols × rows ≥ duration_in_seconds, or use the tile verb (which has the same behavior) and pass -o to control the filename. See the tile section below for the prev_selected_t mechanics in detail.
Same algorithm as tile, different default output name. Use thumbnail-grid when the word "grid" comes to mind; use tile when "tile" or "contact sheet" comes to mind. The tile default filename embeds the dimensions (-tile4x4.jpg), which is handy when re-running with different --cols/--rows values.
thumbnail-strip — Horizontal filmstrip (time-sampled)
A one-row variant of thumbnail-grid. Same time-based sampling, but rows=1 and a per-frame height parameter (rather than width).
- Source:
src/commands/thumbnail-strip.js - Filter:
select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=-1:<H>, tile=<N>x1 - Output:
<input-stem>-strip.jpg
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video |
--frames <n> | 10 | Number of frames in the strip |
--height <n> | 120 | Height of each frame in pixels |
-o, --output <path> | <input-stem>-strip.jpg | Override |
$ npx fqmpeg thumbnail-strip input.mp4 --dry-run
ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=-1:120,tile=10x1 -q:v 2 input-strip.jpg
One frame per second, capped at --frames. With the default --frames 10, the strip covers the first 10 seconds of the video. For a strip that spans a longer runtime, increase --frames so it meets or exceeds duration_in_seconds, or build the strip from snapshot output stitched with tile/montage (ImageMagick) at proportional timestamps.
Filmstrip use case: scrubber previews in video editors, video player hover previews (YouTube's WebVTT thumbnail strip), or social-media-style "scrolling timeline" art.
tile — Contact sheet (time-sampled)
Samples one frame per second (timestamp-based via prev_selected_t), so the grid spans the early portion of the video evenly regardless of source frame rate. Identical algorithm to thumbnail-grid; this verb is the one whose default filename (-tile<C>x<R>.jpg) encodes the dimensions.
- Source:
src/commands/tile.js - Filter:
select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=<W>:-1, tile=<C>x<R> - Output:
<input-stem>-tile<C>x<R>.jpg
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video |
--cols <n> | 4 | Grid columns |
--rows <n> | 4 | Grid rows |
--width <n> | 320 | Width of each tile in pixels |
-o, --output <path> | <input-stem>-tile<C>x<R>.jpg | Override (note the dimensions in the default name) |
$ npx fqmpeg tile input.mp4 --dry-run
ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=320:-1,tile=4x4 -q:v 2 input-tile4x4.jpg
The prev_selected_t predicate:
isnan(prev_selected_t)is true for the very first frame (no previous selection yet —prev_selected_tis undefined → NaN).gte(t - prev_selected_t, 1)is true when at least 1 second has passed since the last selected frame.
Their sum gates the select filter, producing one selection per second. For the default 4×4 = 16 tiles, the output covers the first 16 seconds. For longer videos, the tile=4x4 cap drops anything past the 16th selection, giving you the first 16 seconds. If you want a contact sheet that spans the whole runtime, increase --cols/--rows so cols × rows ≥ duration_in_seconds, or drop to raw FFmpeg with select='gte(t, X)' at proportional timestamps.
Why two verbs (tile and thumbnail-grid): discoverability — users search for "grid" or "tile" and either lands here. The filter and sampling are identical; only the default output filename differs (-tile4x4.jpg for tile, -grid.jpg for thumbnail-grid). Pick whichever name comes to mind, and override -o if you need a stable filename across runs.
Frames ↔ Video Reconstruction
frames-to-video — Image sequence → video
The inverse of video-to-frames. Takes a printf-style numbered sequence (or glob pattern) of still images and produces a single video file.
- Source:
src/commands/frames-to-video.js - Flags:
-framerate <fps> -i <pattern> -c:v <codec> -pix_fmt yuv420p - Output:
<pattern-base>-video.mp4
| Argument / Option | Default | Notes |
|---|---|---|
<pattern> | required | printf pattern (frame_%04d.png) or shell glob (img_*.jpg) |
--fps <n> | 30 | Frames per second |
--codec <name> | libx264 | Video codec |
-o, --output <path> | <pattern-base>-video.mp4 | Override (default strips %d and extension from pattern) |
$ npx fqmpeg frames-to-video frame_%04d.png --dry-run
ffmpeg -framerate 30 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p frame-video.mp4
$ npx fqmpeg frames-to-video 'img_*.jpg' --fps 60 --dry-run
ffmpeg -framerate 60 -i img_*.jpg -c:v libx264 -pix_fmt yuv420p img_*-video.mp4
-framerate vs -r: -framerate sets the input image-sequence rate (how many to consume per second), not the output frame rate. For most image-to-video cases this is what you want — the output frame rate matches by default. If you wanted a different output rate (e.g. interpolate or duplicate frames), you'd add -r after the input.
yuv420p for compatibility: raw image files (especially PNG) decode to yuva420p or rgb24 formats that some players reject in MP4 containers. Forcing yuv420p guarantees playback on QuickTime, mobile browsers, and embedded players. For higher-quality archival output (10-bit, full RGB), pass --codec libx264rgb or drop to raw FFmpeg.
Pattern matching: printf patterns (frame_%04d.png) require sequential numbering starting from 00001 (or 00000 with -start_number 0). Glob patterns (img_*.jpg) work but require the shell to expand them — quote them in scripts to avoid early expansion. Mixed-extension globs (.jpg and .png together) don't work because FFmpeg's image2 demuxer expects uniform format.
slideshow — Multiple images → video with per-image durations
Builds a video from a list of stills where each image is held on screen for a configurable duration. Uses FFmpeg's concat demuxer with an auto-generated listfile (cleaned up on exit).
- Source:
src/commands/slideshow.js - Flags:
-f concat -safe 0 -i <listfile> -vf fps=<n>,format=yuv420p -c:v libx264 -pix_fmt yuv420p - Output:
<input-dir>/slideshow.mp4
| Argument / Option | Default | Notes |
|---|---|---|
<images...> | required (≥2) | Two or more image paths in order |
--duration <sec> | 3 | Seconds per image (uniform across all) |
--fps <n> | 30 | Output frame rate |
-o, --output <path> | <input-dir>/slideshow.mp4 | Override |
$ npx fqmpeg slideshow img1.jpg img2.jpg img3.jpg --dry-run
# Image list (auto-generated):
# file '/abs/path/img1.jpg'
# duration 3
# file '/abs/path/img2.jpg'
# duration 3
# file '/abs/path/img3.jpg'
# duration 3
# file '/abs/path/img3.jpg'
ffmpeg -f concat -safe 0 -i imagelist.txt -vf fps=30,format=yuv420p -c:v libx264 -pix_fmt yuv420p slideshow.mp4
concat demuxer with per-entry duration: the listfile syntax file '<path>' \n duration <sec> lets FFmpeg stitch stills end-to-end at exact times. The cleanup is automatic (fqmpeg uses process.on("exit") to delete the timestamped listfile).
Why the last image is listed twice: in the concat demuxer, the duration line specifies the transition time to the next file, so the final image would otherwise display for zero seconds. fqmpeg repeats the last image as a trailing entry (with no duration line) so it actually holds for --duration seconds and the output is images.length × duration seconds long, matching what the option promises.
Absolute paths in the listfile: fqmpeg resolves each image to an absolute path before writing, so the concat works regardless of where you invoke it from. Single-quoted with internal quotes escaped ('\\''), so paths with apostrophes (e.g. O'Brien.jpg) work too.
Single-image edge case: the command requires ≥ 2 images and errors out with one. For a "still image as video" of arbitrary length, use raw FFmpeg: ffmpeg -loop 1 -i img.jpg -t 10 -c:v libx264 -pix_fmt yuv420p out.mp4.
No built-in transitions — hard cuts only. slideshow uses FFmpeg's concat demuxer, which stitches images end-to-end without crossfades. For fades between images, drop to raw FFmpeg's xfade filter (filter_complex chaining each pair with xfade=transition=fade:duration=1:offset=…), or run slideshow then post-process with a separate fade pass. Adding xfade chains as a built-in would replace the simple concat flow with per-image segment plumbing — out of scope for the "quick" surface.
Analysis & Composition
scenes — Split a video at detected scene cuts
Detects scene changes via the scene metadata variable (a per-frame difference score, 0–1) and uses the segmenter to write each scene to its own file.
- Source:
src/commands/scenes.js - Filter:
select='gt(scene,<threshold>)',setpts=N/FRAME_RATE/TB+-f segment -reset_timestamps 1 - Output:
<input-dir>/<stem>-scene%03d<ext>
| Argument / Option | Default | Range | Notes |
|---|---|---|---|
<input> | required | — | Input video |
--threshold <n> | 0.3 | 0.0–1.0 | Lower = more sensitive = more cuts |
-o, --output <pattern> | <input-dir>/<stem>-scene%03d<ext> | printf pattern | Override |
$ npx fqmpeg scenes movie.mp4 --dry-run
ffmpeg -i movie.mp4 -filter_complex select='gt(scene,0.3)',setpts=N/FRAME_RATE/TB -f segment -reset_timestamps 1 movie-scene%03d.mp4
scene metadata variable: FFmpeg computes a 0–1 score per frame representing how different it is from the previous frame (color histogram difference). Values above the threshold are treated as cuts. Typical values:
0.1–0.2: aggressive (catches dissolves and crossfades too, often false positives on motion)0.3: balanced (the default — most hard cuts in narrative video)0.4–0.5: conservative (only sharp transitions; misses some legitimate cuts)0.7+: only catches the very hardest cuts (chapter boundaries in scripted content)
-reset_timestamps 1: each output segment starts at PTS 0 rather than continuing the source timeline. This is what makes the segments individually playable.
Stream-copy isn't possible. Because the filter touches every frame, the output is re-encoded. For a fast cut list without re-encoding, use raw FFmpeg with -ss/-to after detection — first run scenes --dry-run to read the threshold, then use ffprobe to extract timestamps, then segment via ffmpeg -ss <t1> -to <t2> -c copy.
preview — Generate a short highlight reel
Samples N short clips evenly distributed across the source and concatenates them into a short preview video — the "social media trailer" style.
- Source:
src/commands/preview.js - Filter: select N clips of
<clip-duration>seconds at evenly spaced positions, concat with-t <clips × clip-duration> - Output:
<input-stem>-preview.<ext>
| Argument / Option | Default | Notes |
|---|---|---|
<input> | required | Input video |
--clips <n> | 5 | Number of sample clips |
--clip-duration <sec> | 2 | Duration of each clip in seconds |
-o, --output <path> | <input-stem>-preview.<ext> | Override |
$ npx fqmpeg preview input.mp4 --dry-run
# Note: could not probe input.mp4 for duration. Using placeholder total=60s.
# Run on a real file (or after creating input.mp4) to get exact clip offsets.
ffmpeg -i input.mp4 -vf select='between(t,0,2)+between(t,12,14)+between(t,24,26)+between(t,36,38)+between(t,48,50)',setpts=N/FRAME_RATE/TB -af aselect='between(t,0,2)+between(t,12,14)+between(t,24,26)+between(t,36,38)+between(t,48,50)',asetpts=N/SR/TB -t 10 input-preview.mp4
Output length is deterministic: clips × clip-duration seconds, regardless of source length. The defaults (5 × 2 = 10 s) produce a 10-second highlight — short enough for Twitter / Instagram, long enough to convey the gist of a 30-minute talk.
Even distribution across the source: preview runs ffprobe first to read the source duration T, then selects --clips segments at evenly spaced offsets — clip 1 starts at 0 × T/clips, clip 2 at 1 × T/clips, ..., clip N at (N−1) × T/clips. Each runs for clip-duration seconds. For a 60-minute video with the defaults, that's clips at minute 0, 12, 24, 36, 48 — 2 seconds each. The dry-run output above used T = 60 (placeholder) because input.mp4 doesn't exist on disk; the actual offsets are computed from the real file at run time.
No audio fade between clips. The output cuts hard between sampled clips, so audio pops are likely on music-heavy content. For social-media-grade previews, you'd want crossfades between clips and a music bed; for that you're better off scripting the workflow with trim + crossfade + audio-fade manually.
compare — Side-by-side before/after
Stacks two videos horizontally (default) or vertically into a single output for visual comparison. The canonical "look how much better this filter is" demo.
- Source:
src/commands/compare.js - Filter (horizontal):
[0:v]scale=iw/2:ih[left];[1:v]scale=iw/2:ih[right];[left][right]hstack - Filter (vertical):
[0:v]scale=iw:ih/2[top];[1:v]scale=iw:ih/2[bottom];[top][bottom]vstack - Output:
<input1-stem>-compare.<ext>
| Argument / Option | Default | Allowed | Notes |
|---|---|---|---|
<input1> | required | — | First (left / top) video |
<input2> | required | — | Second (right / bottom) video |
--direction <dir> | horizontal | horizontal, vertical | Layout |
-o, --output <path> | <input1-stem>-compare.<ext> | — | Override |
$ npx fqmpeg compare before.mp4 after.mp4 --dry-run
ffmpeg -i before.mp4 -i after.mp4 -filter_complex [0:v]scale=iw/2:ih[left];[1:v]scale=iw/2:ih[right];[left][right]hstack -c:a copy before-compare.mp4
$ npx fqmpeg compare original.mp4 stabilized.mp4 --direction vertical --dry-run
ffmpeg -i original.mp4 -i stabilized.mp4 -filter_complex [0:v]scale=iw:ih/2[top];[1:v]scale=iw:ih/2[bottom];[top][bottom]vstack -c:a copy original-compare.mp4
Each input is halved before stacking: the output canvas keeps the original width (horizontal) or height (vertical), with each input scaled to half. This preserves the overall aspect ratio. If the two inputs differ in dimensions, the scale step normalizes them to the same half-size — but content distortion can result. For best output, the two inputs should already match in resolution and duration.
Audio is copied from input 1. -c:a copy takes the first input's audio track stream-copied; input 2's audio is discarded. This matches the usual before/after framing (compare visuals, keep one audio).
Duration mismatch: if the two videos differ in length, the output ends when the shorter one ends (FFmpeg's hstack/vstack default), with the remaining frames of the longer one truncated. Trim both to matching lengths first if you need exact alignment.
No built-in labels. For "Left"/"Right" or "Before"/"After" text overlays, dry-run the filter, then drop to raw FFmpeg and splice drawtext into each scale step — see Recipe 2 below for a concrete template. Adding a --label option would force a fontfile-dependency surface (drawtext needs libfreetype + a font path) that's out of scope for the quick surface.
Real-World Recipes
Recipe 1: YouTube thumbnail workflow
You have a finished 12-minute video and need a custom thumbnail. Goal: pick the best frame, scale to YouTube's spec (1280×720), and check the contact sheet to confirm the choice.
# Step 1: contact sheet to pick the best moment (time-based sampling)
npx fqmpeg tile video.mp4 --cols 6 --rows 8 --width 480
# → video-tile6x8.jpg, 48 frames spanning first 48 seconds
# For longer coverage of a 12-minute (720-second) video, you'd need cols×rows ≥ 720,
# so a 30×24 grid (720 tiles). At width 200 that's a 6000×~3375 px sheet.
npx fqmpeg tile video.mp4 --cols 30 --rows 24 --width 200
# Step 2: extract the chosen frame at its exact timestamp
npx fqmpeg thumbnail video.mp4 -s 374 -o thumbnail-raw.jpg
# Step 3: resize to YouTube's recommended thumbnail spec
npx fqmpeg resize thumbnail-raw.jpg 1280x720 -o thumbnail-final.jpg
The tile step is the slow part — it has to decode through the source. Once you've picked your timestamp, thumbnail is near-instant thanks to keyframe-aligned -ss.
Recipe 2: Time-lapse from a security camera dump
You have a folder of 86,400 JPEGs from a security camera (one per second, 24 hours of footage). You want a 60-second time-lapse video at 30 fps.
# 86400 input frames at 30 fps output = 86400/30 = 2880 seconds = 48 minutes.
# To compress to 60 seconds at 30 fps, we need 1800 output frames.
# Sample 1 input frame per 48 input frames: 86400 / 1800 = 48.
# Step 1: select every 48th frame to a renamed sequence
ls *.jpg | awk 'NR%48==1' | while read f; do
printf -v new "frame_%04d.jpg" "$i"
ln -s "$(realpath "$f")" "$new"
((i++))
done
# Step 2: stitch into a 30 fps time-lapse
npx fqmpeg frames-to-video frame_%04d.jpg --fps 30
# → frame-video.mp4
Alternative without symlinks: use slideshow with --duration 0.0333 (1/30 s per image) — but slideshow re-encodes through concat demuxer, which is slower than frames-to-video's direct -i pattern. For datasets this large, frames-to-video is the right tool; slideshow is for ≤ tens of images with per-image durations.
Recipe 3: Before/after filter comparison for portfolio
You're documenting the effect of stabilize on a shaky drone clip. Build a side-by-side comparison video with labels for your portfolio:
# Step 1: stabilize the original
npx fqmpeg stabilize drone-raw.mp4 -o drone-stable.mp4
# Step 2: side-by-side comparison
npx fqmpeg compare drone-raw.mp4 drone-stable.mp4 \
-o drone-comparison.mp4
# Step 3: extract a single-frame thumbnail for the case study cover image
npx fqmpeg thumbnail drone-comparison.mp4 -s 3 -o drone-cover.jpg
compare itself doesn't add labels — for a polished case study, dry-run the filter and re-render with raw FFmpeg, splicing drawtext into each scale step:
ffmpeg -i drone-raw.mp4 -i drone-stable.mp4 -filter_complex \
"[0:v]scale=iw/2:ih,drawtext=text='Original':x=20:y=20:fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5[left];[1:v]scale=iw/2:ih,drawtext=text='Stabilized':x=20:y=20:fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5[right];[left][right]hstack" \
-c:a copy drone-portfolio.mp4
Frequently Asked Questions
Should I use thumbnail-grid or tile?
Either — they're aliases. Both use the same time-based sampling (select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)') and produce the same contact sheet for a given --cols/--rows/--width. The only difference is the default output filename: thumbnail-grid writes <stem>-grid.jpg, tile writes <stem>-tile<C>x<R>.jpg. Pick tile when the filename should record the grid size; pick thumbnail-grid when a fixed -grid.jpg filename is more convenient. Either way, pass -o to override.
What's the difference between snapshot and video-to-frames?
Both extract periodic stills, but with different defaults and output locations:
snapshotdefaults to one frame per second, JPEG format, output alongside the input (<input-dir>/<stem>-snap-%04d.jpg).video-to-framesdefaults to every frame at the source rate, PNG format, output to the current working directory (./frame_%04d.png).
If you want "occasional reference stills next to the video," snapshot is the right tool. If you want "every frame for ML or editing," video-to-frames is the right tool. Pass -o if you want either to write somewhere else.
Why is count-frames slow on long videos?
Because -count_frames decodes the entire video stream. The flag tells ffprobe to actually walk every packet and count successfully decoded frames — necessary for exact accuracy (especially on variable-frame-rate files), but it does cost a full decode pass. For a 4-hour 4K source, that can take minutes. If an estimate is fine, query duration and r_frame_rate from the header and multiply — no decoding required:
ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4
Can I use frames-to-video with a glob like img_*.jpg?
Yes, but quote it. FFmpeg's image2 demuxer accepts both printf patterns (frame_%04d.png) and shell globs ('img_*.jpg'). Unquoted globs get expanded by the shell before FFmpeg sees them, which usually breaks the command — quote with single quotes so the literal pattern reaches FFmpeg. Glob mode requires uniform extension across all files (no mixed .jpg + .png).
Why does tile only cover the first 16 seconds of my long video?
Because the default 4×4 = 16 tiles, combined with the 1-second time-based sampling, naturally covers exactly 16 seconds. To span a longer video, increase --cols and --rows so cols × rows ≥ duration_in_seconds. For a 5-minute (300-second) video evenly sampled, you'd need a roughly 17×18 grid (306 tiles). For coarser sampling on long videos, drop to raw FFmpeg and adjust the select predicate — e.g. select='gte(t - prev_selected_t, 60)' for one frame per minute.
How is frames-to-video different from slideshow?
frames-to-video consumes a numbered or globbed image sequence (frame_%04d.png) and produces a video at a single frame rate — every input image becomes one frame, uniformly. slideshow consumes an explicit list of images (img1.jpg img2.jpg img3.jpg) and lets you set a per-image duration (each image held for N seconds), with hard cuts between images (no built-in crossfade — see the slideshow section for how to add fades via raw FFmpeg xfade). Use frames-to-video for time-lapses, ML output reconstruction, and image sequences from rendering. Use slideshow for photo presentations where each image needs to linger for a few seconds.
What threshold should I use with scenes for a typical narrative video?
Start with the default 0.3. If you're getting too few cuts (missing hard transitions), drop to 0.2 or 0.15. If you're getting too many cuts (false positives on camera motion, dissolves), raise to 0.4 or 0.5. Threshold sensitivity depends on content — music videos with fast motion often need higher thresholds (0.5+) to ignore in-shot movement, while talking-head interviews can use lower (0.2) because the only changes are real cuts.
Can compare handle audio from both inputs simultaneously?
No. The implementation uses -c:a copy which takes only input 1's audio stream. Input 2's audio is discarded. This matches the usual workflow (before/after visual comparison with one common audio track). For dual-audio comparison (e.g. comparing two different audio mixes side by side), use raw FFmpeg with amerge or amix:
ffmpeg -i a.mp4 -i b.mp4 -filter_complex \
"[0:v]scale=iw/2:ih[L];[1:v]scale=iw/2:ih[R];[L][R]hstack;[0:a][1:a]amerge=inputs=2[a]" \
-map "[a]" -ac 2 compare-dual-audio.mp4
Wrapping Up
The twelve C12 verbs cover the round-trip between video and individual frames:
thumbnail,snapshot,video-to-frames,count-framesfor single or periodic stills (-q:v 2is the JPEG quality default;count-framesis the onlyffprobeverb in the cluster)thumbnail-grid,thumbnail-strip,tilefor contact sheets and filmstrips (all three use time-based sampling — one frame per second — with--cols × --rows(grid/tile) or--frames(strip) as the cap)frames-to-video,slideshowfor rebuilding video from images (frames-to-videofor uniform-rate sequences,slideshowfor per-image durations with hard cuts — add fades via raw FFmpegxfadeif needed)scenes,preview,comparefor analysis and composition (scenessegments at detected cuts;previewbuilds an evenly-sampled highlight reel;comparestacks two videos for before/after demos)
Every verb prints its underlying FFmpeg invocation under --dry-run, so when the simplified surface isn't enough (custom contact-sheet sampling, dual-audio compare, frame-accurate seeks instead of keyframe-aligned), copy the filter, edit, and call FFmpeg directly. For the broader fqmpeg map, see the fqmpeg complete guide.