fqmpeg Thumbnails, Frame Extraction & Slideshows: 12 Verbs Explained

Q: Why is `count-frames` slow on long videos?

Because `-count_frames` decodes the entire video stream. The flag tells `ffprobe` to actually walk every packet and count successfully decoded frames — necessary for exact accuracy (especially on variable-frame-rate files), but it does cost a full decode pass. For a 4-hour 4K source, that can take minutes. If an estimate is fine, query `duration` and `r_frame_rate` from the header and multiply — no decoding required: ```bash ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4 ```

Q: Can I use `frames-to-video` with a glob like `img_*.jpg`?

Yes, but quote it. FFmpeg's image2 demuxer accepts both `printf` patterns (`frame_%04d.png`) and shell globs (`'img_*.jpg'`). Unquoted globs get expanded by the shell before FFmpeg sees them, which usually breaks the command — quote with single quotes so the literal pattern reaches FFmpeg. Glob mode requires uniform extension across all files (no mixed `.jpg` + `.png`).

Q: How is `frames-to-video` different from `slideshow`?

`frames-to-video` consumes a *numbered or globbed image sequence* (`frame_%04d.png`) and produces a video at a single frame rate — every input image becomes one frame, uniformly. `slideshow` consumes an *explicit list of images* (`img1.jpg img2.jpg img3.jpg`) and lets you set a per-image *duration* (each image held for N seconds), with hard cuts between images (no built-in crossfade — see the `slideshow` section for how to add fades via raw FFmpeg `xfade`). Use `frames-to-video` for time-lapses, ML output reconstruction, and image sequences from rendering. Use `slideshow` for photo presentations where each image needs to linger for a few seconds.

Q: What threshold should I use with `scenes` for a typical narrative video?

Start with the default `0.3`. If you're getting too few cuts (missing hard transitions), drop to `0.2` or `0.15`. If you're getting too many cuts (false positives on camera motion, dissolves), raise to `0.4` or `0.5`. Threshold sensitivity depends on content — music videos with fast motion often need higher thresholds (`0.5`+) to ignore in-shot movement, while talking-head interviews can use lower (`0.2`) because the only changes are real cuts.

Q: Can `compare` handle audio from both inputs simultaneously?

No. The implementation uses `-c:a copy` which takes only input 1's audio stream. Input 2's audio is discarded. This matches the usual workflow (before/after visual comparison with one common audio track). For dual-audio comparison (e.g. comparing two different audio mixes side by side), use raw FFmpeg with `amerge` or `amix`: ```bash ffmpeg -i a.mp4 -i b.mp4 -filter_complex \ "[0:v]scale=iw/2:ih[L];[1:v]scale=iw/2:ih[R];[L][R]hstack;[0:a][1:a]amerge=inputs=2[a]" \ -map "[a]" -ac 2 compare-dual-audio.mp4 ```

Q: Can `compare` handle audio from both inputs simultaneously?

No. The implementation uses `-c:a copy` which takes only input 1's audio stream. Input 2's audio is discarded. This matches the usual workflow (before/after visual comparison with one common audio track). For dual-audio comparison (e.g. comparing two different audio mixes side by side), use raw FFmpeg with `amerge` or `amix`: ```bash ffmpeg -i a.mp4 -i b.mp4 -filter_complex \ "[0:v]scale=iw/2:ih[L];[1:v]scale=iw/2:ih[R];[L][R]hstack;[0:a][1:a]amerge=inputs=2[a]" \ -map "[a]" -ac 2 compare-dual-audio.mp4 ```

fqmpeg's C12 cluster covers the back-and-forth between video and individual frames — single thumbnails, periodic snapshots, contact sheets, filmstrips, scene-detection splits, slideshows from stills, side-by-side comparisons, and counting frames with ffprobe. Twelve verbs total, all working through ffmpeg (or ffprobe in one case) but exposing far simpler arguments than the underlying filter chains.

This guide walks each verb against its source in src/commands/ of fqmpeg 3.0.3 — the underlying FFmpeg filter or flag, the defaults, the output filename, and the gotchas that aren't visible from --help alone (thumbnail-grid and tile both produce contact sheets but sample differently; snapshot and video-to-frames both extract periodic stills but to different default directories; frames-to-video and slideshow both build video from images but use entirely different mechanisms).

What you'll get out of this guide

A decision matrix for the 12 verbs by task (single still / multiple stills / contact sheets / video reconstruction / analysis)
Exact FFmpeg invocation each verb generates (verified --dry-run output)
Defaults, units, output filenames — and which verbs overlap (thumbnail-grid vs tile, snapshot vs video-to-frames)
Three recipes — YouTube thumbnail workflow, time-lapse from a security camera dump, before/after filter comparison

The 12 Verbs at a Glance

The cluster splits into four task groups. Pick the group, then the verb.

Group	Verbs	What they do
Single & periodic stills	`thumbnail`, `snapshot`, `video-to-frames`, `count-frames`	Pull one image at a timestamp, or many at intervals / per frame
Contact sheets & filmstrips	`thumbnail-grid`, `thumbnail-strip`, `tile`	Composite multiple frames into a single image (grid or row)
Frames ↔ video reconstruction	`frames-to-video`, `slideshow`	Build video from a numbered pattern or a list of images
Analysis & composition	`scenes`, `preview`, `compare`	Detect cuts, generate highlight reels, side-by-side before/after

Five things to know before reading on:

thumbnail-grid and tile build contact sheets with the same time-based sampling. Both use select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)' — one frame per second — which behaves predictably on long and short videos alike. The two verbs are kept as aliases for discoverability (search for "grid" or "tile" and either lands here); the only difference is the default output filename (-grid.jpg vs -tile<C>x<R>.jpg).
snapshot outputs alongside the input; video-to-frames outputs to the current working directory. Both extract periodic stills, but their default output patterns differ — snapshot puts files in the same folder as the input (<input-dir>/<stem>-snap-%04d.jpg), while video-to-frames writes to ./frame_%04d.png from cwd. If you want predictable paths, always pass -o.
count-frames is the only verb in C12 that uses ffprobe, not ffmpeg. It runs ffprobe -count_frames -select_streams v:0 and prints a single integer to stdout. No file is written.
frames-to-video uses -framerate (input rate), not -r (output rate). This matters because -framerate tells FFmpeg how to interpret the still sequence (how many to feed per second), while -r would re-time the output stream. For straight image-to-video at a consistent rate, -framerate is the correct flag and what fqmpeg uses.
scenes splits a video at detected cuts using the segmenter. Each scene becomes its own file (<stem>-scene000.mp4, <stem>-scene001.mp4, ...) — useful for breaking long recordings into clips automatically. Threshold is 0.0–1.0; lower = more sensitive = more cuts.

Single & Periodic Stills

`thumbnail` — One frame as a single image

The simplest verb in the cluster: seek to a timestamp and write one JPEG. Used for video thumbnails (YouTube uploads, gallery covers, OG image tags).

Source: src/commands/thumbnail.js
Flags: -ss <sec> -i <input> -frames:v 1 -q:v 2
Output: <input-stem>-thumb.jpg

Argument / Option	Default	Notes
`<input>`	required	Input video file
`-s, --start <sec>`	`1`	Timestamp in seconds (decimal allowed)
`-o, --output <path>`	`<input-stem>-thumb.jpg`	Override output

bash

$ npx fqmpeg thumbnail input.mp4 --dry-run

  ffmpeg -ss 1 -i input.mp4 -frames:v 1 -q:v 2 input-thumb.jpg

bash

$ npx fqmpeg thumbnail input.mp4 -s 45.5 -o cover.jpg --dry-run

  ffmpeg -ss 45.5 -i input.mp4 -frames:v 1 -q:v 2 cover.jpg

-ss before -i is the fast-seek form. FFmpeg uses keyframe-aligned seeking when -ss precedes -i, which is dramatically faster on long files but lands on the nearest keyframe rather than the exact requested timestamp. For a content thumbnail, this is fine — keyframes are usually a half-second apart at most. For frame-accurate extraction, you'd put -ss after -i, but that decodes from the start and is much slower (and fqmpeg doesn't expose that mode here).

-q:v 2: JPEG quality scale where 2 is near-lossless (the scale is 1–31, lower is better). This is the highest reasonable JPEG quality for a thumbnail — large file, sharp output. For smaller thumbnails (gallery tiles), generate at full quality here and downscale separately with resize.

`snapshot` — Frames at regular intervals

Extracts one frame every N seconds across the entire video. The output is a numbered sequence (<stem>-snap-0001.jpg, 0002.jpg, ...), saved alongside the input file.

Source: src/commands/snapshot.js
Filter: fps=<1/interval>
Output: <input-dir>/<stem>-snap-%04d.<format>

Argument / Option	Default	Allowed	Notes
`<input>`	required	—	Input video file
`--interval <seconds>`	`1`	positive number	Capture interval (so `--interval 5` = one frame per 5 s)
`--format <fmt>`	`jpg`	`jpg`, `png`	Output image format
`-o, --output <pattern>`	`<input-dir>/<stem>-snap-%04d.<format>`	printf pattern	Override; must contain `%d` or `%0Nd`

bash

$ npx fqmpeg snapshot lecture.mp4 --dry-run

  ffmpeg -i lecture.mp4 -vf fps=1 -q:v 2 lecture-snap-%04d.jpg

bash

$ npx fqmpeg snapshot lecture.mp4 --interval 30 --format png --dry-run

  ffmpeg -i lecture.mp4 -vf fps=0.03333333333333333 -q:v 2 lecture-snap-%04d.png

Why fps= instead of select=: the fps filter is the simplest way to enforce a constant output rate — FFmpeg drops or duplicates frames as needed to match the target. fps=0.0333... means "one frame every 30 seconds", and FFmpeg picks whichever frame is closest to each 30-second tick.

-q:v 2 is applied to PNG too, but ignored — PNG quality is determined by compression level, not q scale. JPEG is where it matters.

Counting outputs: for a 60-minute video at --interval 30, you'll get 120 files (-snap-0001.jpg through -snap-0120.jpg). For --interval 1 on the same file, you'd get 3600 files. Plan disk space accordingly.

`video-to-frames` — Every frame (or fps-throttled) as images

Extracts frames at the source frame rate by default, or at a throttled rate if --fps is supplied. Use this when you need every frame (frame-by-frame retouching, ML training datasets) or a known per-second sampling rate.

Source: src/commands/video-to-frames.js
Flag: -i <input> (+ optional -vf fps=<n>)
Output: ./frame_%04d.<format> (in current working directory)

Argument / Option	Default	Allowed	Notes
`<input>`	required	—	Input video
`--fps <n>`	source rate (no filter)	positive number	Throttle to this many frames per second
`--format <fmt>`	`png`	`png`, `jpg`	Image format. PNG default (lossless) for ML / editing
`-o, --output <pattern>`	`./frame_%04d.<format>`	printf pattern	Override

bash

$ npx fqmpeg video-to-frames input.mp4 --dry-run

  ffmpeg -i input.mp4 frame_%04d.png

bash

$ npx fqmpeg video-to-frames input.mp4 --fps 5 --format jpg --dry-run

  ffmpeg -i input.mp4 -vf fps=5 frame_%04d.jpg

Default is every frame. At 30 fps for 60 s, that's 1800 PNGs — easily several hundred megabytes. If you just want periodic samples, use snapshot (which has a saner default) or pass --fps.

Output goes to CWD, not the input directory. This is intentional — frame dumps are often scratch data that you want in a known working folder, not polluting the directory the video lives in. But it's a behavioral inconsistency with snapshot, so if you want them next to the input, pass -o input/dir/frame_%04d.png.

PNG vs JPG: PNG is the safe default for downstream image processing (lossless, alpha-preserving). JPG is 5–10× smaller but lossy — fine for previewing or for ML training where the model will downsample anyway. The format flag drives the extension; FFmpeg picks the codec from that.

`count-frames` — Total frame count via `ffprobe`

Returns the exact number of video frames in the stream. The only C12 verb that doesn't use ffmpeg — it runs ffprobe -count_frames and writes a single integer to stdout, no output file.

Source: src/commands/count-frames.js
Binary: ffprobe (not ffmpeg)
Output: integer to stdout

Argument / Option	Notes
`<input>`	Input video

bash

$ npx fqmpeg count-frames input.mp4 --dry-run

  ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 input.mp4

bash

$ npx fqmpeg count-frames input.mp4

1798

-count_frames decodes the entire stream. It's accurate but slow — for a long file, count-frames reads every byte of the video. For a faster estimate, query duration × frame rate from the header (no decoding required):

bash

ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4

That returns duration and r_frame_rate (e.g. 60.5,30000/1001), and you multiply (60.5 × 29.97 ≈ 1814 frames) — close but not exact for variable-frame-rate files.

Why exact counts matter: for video editing automation (split into N equal chunks of frames), pose-estimation workflows that need per-frame indexing, or anti-cheat / forensic analysis where the exact frame count is a fingerprint. For most needs, the header-derived estimate is fine.

Contact Sheets & Filmstrips

`thumbnail-grid` — Contact sheet (alias of `tile`)

A multi-frame thumbnail composite arranged as a grid. The classic "DVD chapter selection" look. Uses the same time-based sampling as tile (one frame per second), kept under this name for users who search for "grid".

Source: src/commands/thumbnail-grid.js
Filter: select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=<W>:-1, tile=<C>x<R>
Output: <input-stem>-grid.jpg

Argument / Option	Default	Notes
`<input>`	required	Input video
`--cols <n>`	`4`	Grid columns
`--rows <n>`	`4`	Grid rows
`--width <n>`	`320`	Width of each tile in pixels
`-o, --output <path>`	`<input-stem>-grid.jpg`	Override

bash

$ npx fqmpeg thumbnail-grid input.mp4 --dry-run

  ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=320:-1,tile=4x4 -q:v 2 input-grid.jpg

One frame per second, capped at cols × rows. With the default 4×4 = 16 tiles, the output covers the first 16 seconds of the video. For longer videos, increase --cols/--rows so cols × rows ≥ duration_in_seconds, or use the tile verb (which has the same behavior) and pass -o to control the filename. See the tile section below for the prev_selected_t mechanics in detail.

Same algorithm as tile, different default output name. Use thumbnail-grid when the word "grid" comes to mind; use tile when "tile" or "contact sheet" comes to mind. The tile default filename embeds the dimensions (-tile4x4.jpg), which is handy when re-running with different --cols/--rows values.

`thumbnail-strip` — Horizontal filmstrip (time-sampled)

A one-row variant of thumbnail-grid. Same time-based sampling, but rows=1 and a per-frame height parameter (rather than width).

Source: src/commands/thumbnail-strip.js
Filter: select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=-1:<H>, tile=<N>x1
Output: <input-stem>-strip.jpg

Argument / Option	Default	Notes
`<input>`	required	Input video
`--frames <n>`	`10`	Number of frames in the strip
`--height <n>`	`120`	Height of each frame in pixels
`-o, --output <path>`	`<input-stem>-strip.jpg`	Override

bash

$ npx fqmpeg thumbnail-strip input.mp4 --dry-run

  ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=-1:120,tile=10x1 -q:v 2 input-strip.jpg

One frame per second, capped at --frames. With the default --frames 10, the strip covers the first 10 seconds of the video. For a strip that spans a longer runtime, increase --frames so it meets or exceeds duration_in_seconds, or build the strip from snapshot output stitched with tile/montage (ImageMagick) at proportional timestamps.

Filmstrip use case: scrubber previews in video editors, video player hover previews (YouTube's WebVTT thumbnail strip), or social-media-style "scrolling timeline" art.

`tile` — Contact sheet (time-sampled)

Samples one frame per second (timestamp-based via prev_selected_t), so the grid spans the early portion of the video evenly regardless of source frame rate. Identical algorithm to thumbnail-grid; this verb is the one whose default filename (-tile<C>x<R>.jpg) encodes the dimensions.

Source: src/commands/tile.js
Filter: select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=<W>:-1, tile=<C>x<R>
Output: <input-stem>-tile<C>x<R>.jpg

Argument / Option	Default	Notes
`<input>`	required	Input video
`--cols <n>`	`4`	Grid columns
`--rows <n>`	`4`	Grid rows
`--width <n>`	`320`	Width of each tile in pixels
`-o, --output <path>`	`<input-stem>-tile<C>x<R>.jpg`	Override (note the dimensions in the default name)

bash

$ npx fqmpeg tile input.mp4 --dry-run

  ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=320:-1,tile=4x4 -q:v 2 input-tile4x4.jpg

The prev_selected_t predicate:

isnan(prev_selected_t) is true for the very first frame (no previous selection yet — prev_selected_t is undefined → NaN).
gte(t - prev_selected_t, 1) is true when at least 1 second has passed since the last selected frame.

Their sum gates the select filter, producing one selection per second. For the default 4×4 = 16 tiles, the output covers the first 16 seconds. For longer videos, the tile=4x4 cap drops anything past the 16th selection, giving you the first 16 seconds. If you want a contact sheet that spans the whole runtime, increase --cols/--rows so cols × rows ≥ duration_in_seconds, or drop to raw FFmpeg with select='gte(t, X)' at proportional timestamps.

Why two verbs (tile and thumbnail-grid): discoverability — users search for "grid" or "tile" and either lands here. The filter and sampling are identical; only the default output filename differs (-tile4x4.jpg for tile, -grid.jpg for thumbnail-grid). Pick whichever name comes to mind, and override -o if you need a stable filename across runs.

Frames ↔ Video Reconstruction

`frames-to-video` — Image sequence → video

The inverse of video-to-frames. Takes a printf-style numbered sequence (or glob pattern) of still images and produces a single video file.

Source: src/commands/frames-to-video.js
Flags: -framerate <fps> -i <pattern> -c:v <codec> -pix_fmt yuv420p
Output: <pattern-base>-video.mp4

Argument / Option	Default	Notes
`<pattern>`	required	`printf` pattern (`frame_%04d.png`) or shell glob (`img_*.jpg`)
`--fps <n>`	`30`	Frames per second
`--codec <name>`	`libx264`	Video codec
`-o, --output <path>`	`<pattern-base>-video.mp4`	Override (default strips `%d` and extension from pattern)

bash

$ npx fqmpeg frames-to-video frame_%04d.png --dry-run

  ffmpeg -framerate 30 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p frame-video.mp4

bash

$ npx fqmpeg frames-to-video 'img_*.jpg' --fps 60 --dry-run

  ffmpeg -framerate 60 -i img_*.jpg -c:v libx264 -pix_fmt yuv420p img_*-video.mp4

-framerate vs -r: -framerate sets the input image-sequence rate (how many to consume per second), not the output frame rate. For most image-to-video cases this is what you want — the output frame rate matches by default. If you wanted a different output rate (e.g. interpolate or duplicate frames), you'd add -r after the input.

yuv420p for compatibility: raw image files (especially PNG) decode to yuva420p or rgb24 formats that some players reject in MP4 containers. Forcing yuv420p guarantees playback on QuickTime, mobile browsers, and embedded players. For higher-quality archival output (10-bit, full RGB), pass --codec libx264rgb or drop to raw FFmpeg.

Pattern matching: printf patterns (frame_%04d.png) require sequential numbering starting from 00001 (or 00000 with -start_number 0). Glob patterns (img_*.jpg) work but require the shell to expand them — quote them in scripts to avoid early expansion. Mixed-extension globs (.jpg and .png together) don't work because FFmpeg's image2 demuxer expects uniform format.

`slideshow` — Multiple images → video with per-image durations

Builds a video from a list of stills where each image is held on screen for a configurable duration. Uses FFmpeg's concat demuxer with an auto-generated listfile (cleaned up on exit).

Source: src/commands/slideshow.js
Flags: -f concat -safe 0 -i <listfile> -vf fps=<n>,format=yuv420p -c:v libx264 -pix_fmt yuv420p
Output: <input-dir>/slideshow.mp4

Argument / Option	Default	Notes
`<images...>`	required (≥2)	Two or more image paths in order
`--duration <sec>`	`3`	Seconds per image (uniform across all)
`--fps <n>`	`30`	Output frame rate
`-o, --output <path>`	`<input-dir>/slideshow.mp4`	Override

bash

$ npx fqmpeg slideshow img1.jpg img2.jpg img3.jpg --dry-run

  # Image list (auto-generated):
  # file '/abs/path/img1.jpg'
  # duration 3
  # file '/abs/path/img2.jpg'
  # duration 3
  # file '/abs/path/img3.jpg'
  # duration 3
  # file '/abs/path/img3.jpg'

  ffmpeg -f concat -safe 0 -i imagelist.txt -vf fps=30,format=yuv420p -c:v libx264 -pix_fmt yuv420p slideshow.mp4

concat demuxer with per-entry duration: the listfile syntax file '<path>' \n duration <sec> lets FFmpeg stitch stills end-to-end at exact times. The cleanup is automatic (fqmpeg uses process.on("exit") to delete the timestamped listfile).

Why the last image is listed twice: in the concat demuxer, the duration line specifies the transition time to the next file, so the final image would otherwise display for zero seconds. fqmpeg repeats the last image as a trailing entry (with no duration line) so it actually holds for --duration seconds and the output is images.length × duration seconds long, matching what the option promises.

Absolute paths in the listfile: fqmpeg resolves each image to an absolute path before writing, so the concat works regardless of where you invoke it from. Single-quoted with internal quotes escaped ('\\''), so paths with apostrophes (e.g. O'Brien.jpg) work too.

Single-image edge case: the command requires ≥ 2 images and errors out with one. For a "still image as video" of arbitrary length, use raw FFmpeg: ffmpeg -loop 1 -i img.jpg -t 10 -c:v libx264 -pix_fmt yuv420p out.mp4.

No built-in transitions — hard cuts only. slideshow uses FFmpeg's concat demuxer, which stitches images end-to-end without crossfades. For fades between images, drop to raw FFmpeg's xfade filter (filter_complex chaining each pair with xfade=transition=fade:duration=1:offset=…), or run slideshow then post-process with a separate fade pass. Adding xfade chains as a built-in would replace the simple concat flow with per-image segment plumbing — out of scope for the "quick" surface.

Analysis & Composition

`scenes` — Split a video at detected scene cuts

Detects scene changes via the scene metadata variable (a per-frame difference score, 0–1) and uses the segmenter to write each scene to its own file.

Source: src/commands/scenes.js
Filter: select='gt(scene,<threshold>)',setpts=N/FRAME_RATE/TB + -f segment -reset_timestamps 1
Output: <input-dir>/<stem>-scene%03d<ext>

Argument / Option	Default	Range	Notes
`<input>`	required	—	Input video
`--threshold <n>`	`0.3`	`0.0`–`1.0`	Lower = more sensitive = more cuts
`-o, --output <pattern>`	`<input-dir>/<stem>-scene%03d<ext>`	printf pattern	Override

bash

$ npx fqmpeg scenes movie.mp4 --dry-run

  ffmpeg -i movie.mp4 -filter_complex select='gt(scene,0.3)',setpts=N/FRAME_RATE/TB -f segment -reset_timestamps 1 movie-scene%03d.mp4

scene metadata variable: FFmpeg computes a 0–1 score per frame representing how different it is from the previous frame (color histogram difference). Values above the threshold are treated as cuts. Typical values:

0.1–0.2: aggressive (catches dissolves and crossfades too, often false positives on motion)
0.3: balanced (the default — most hard cuts in narrative video)
0.4–0.5: conservative (only sharp transitions; misses some legitimate cuts)
0.7+: only catches the very hardest cuts (chapter boundaries in scripted content)

-reset_timestamps 1: each output segment starts at PTS 0 rather than continuing the source timeline. This is what makes the segments individually playable.

Stream-copy isn't possible. Because the filter touches every frame, the output is re-encoded. For a fast cut list without re-encoding, use raw FFmpeg with -ss/-to after detection — first run scenes --dry-run to read the threshold, then use ffprobe to extract timestamps, then segment via ffmpeg -ss <t1> -to <t2> -c copy.

`preview` — Generate a short highlight reel

Samples N short clips evenly distributed across the source and concatenates them into a short preview video — the "social media trailer" style.

Source: src/commands/preview.js
Filter: select N clips of <clip-duration> seconds at evenly spaced positions, concat with -t <clips × clip-duration>
Output: <input-stem>-preview.<ext>

Argument / Option	Default	Notes
`<input>`	required	Input video
`--clips <n>`	`5`	Number of sample clips
`--clip-duration <sec>`	`2`	Duration of each clip in seconds
`-o, --output <path>`	`<input-stem>-preview.<ext>`	Override

bash

$ npx fqmpeg preview input.mp4 --dry-run

  # Note: could not probe input.mp4 for duration. Using placeholder total=60s.
  # Run on a real file (or after creating input.mp4) to get exact clip offsets.

  ffmpeg -i input.mp4 -vf select='between(t,0,2)+between(t,12,14)+between(t,24,26)+between(t,36,38)+between(t,48,50)',setpts=N/FRAME_RATE/TB -af aselect='between(t,0,2)+between(t,12,14)+between(t,24,26)+between(t,36,38)+between(t,48,50)',asetpts=N/SR/TB -t 10 input-preview.mp4

Output length is deterministic: clips × clip-duration seconds, regardless of source length. The defaults (5 × 2 = 10 s) produce a 10-second highlight — short enough for Twitter / Instagram, long enough to convey the gist of a 30-minute talk.

Even distribution across the source: preview runs ffprobe first to read the source duration T, then selects --clips segments at evenly spaced offsets — clip 1 starts at 0 × T/clips, clip 2 at 1 × T/clips, ..., clip N at (N−1) × T/clips. Each runs for clip-duration seconds. For a 60-minute video with the defaults, that's clips at minute 0, 12, 24, 36, 48 — 2 seconds each. The dry-run output above used T = 60 (placeholder) because input.mp4 doesn't exist on disk; the actual offsets are computed from the real file at run time.

No audio fade between clips. The output cuts hard between sampled clips, so audio pops are likely on music-heavy content. For social-media-grade previews, you'd want crossfades between clips and a music bed; for that you're better off scripting the workflow with trim + crossfade + audio-fade manually.

`compare` — Side-by-side before/after

Stacks two videos horizontally (default) or vertically into a single output for visual comparison. The canonical "look how much better this filter is" demo.

Source: src/commands/compare.js
Filter (horizontal): [0:v]scale=iw/2:ih[left];[1:v]scale=iw/2:ih[right];[left][right]hstack
Filter (vertical): [0:v]scale=iw:ih/2[top];[1:v]scale=iw:ih/2[bottom];[top][bottom]vstack
Output: <input1-stem>-compare.<ext>

Argument / Option	Default	Allowed	Notes
`<input1>`	required	—	First (left / top) video
`<input2>`	required	—	Second (right / bottom) video
`--direction <dir>`	`horizontal`	`horizontal`, `vertical`	Layout
`-o, --output <path>`	`<input1-stem>-compare.<ext>`	—	Override

bash

$ npx fqmpeg compare before.mp4 after.mp4 --dry-run

  ffmpeg -i before.mp4 -i after.mp4 -filter_complex [0:v]scale=iw/2:ih[left];[1:v]scale=iw/2:ih[right];[left][right]hstack -c:a copy before-compare.mp4

bash

$ npx fqmpeg compare original.mp4 stabilized.mp4 --direction vertical --dry-run

  ffmpeg -i original.mp4 -i stabilized.mp4 -filter_complex [0:v]scale=iw:ih/2[top];[1:v]scale=iw:ih/2[bottom];[top][bottom]vstack -c:a copy original-compare.mp4

Each input is halved before stacking: the output canvas keeps the original width (horizontal) or height (vertical), with each input scaled to half. This preserves the overall aspect ratio. If the two inputs differ in dimensions, the scale step normalizes them to the same half-size — but content distortion can result. For best output, the two inputs should already match in resolution and duration.

Audio is copied from input 1. -c:a copy takes the first input's audio track stream-copied; input 2's audio is discarded. This matches the usual before/after framing (compare visuals, keep one audio).

Duration mismatch: if the two videos differ in length, the output ends when the shorter one ends (FFmpeg's hstack/vstack default), with the remaining frames of the longer one truncated. Trim both to matching lengths first if you need exact alignment.

No built-in labels. For "Left"/"Right" or "Before"/"After" text overlays, dry-run the filter, then drop to raw FFmpeg and splice drawtext into each scale step — see Recipe 2 below for a concrete template. Adding a --label option would force a fontfile-dependency surface (drawtext needs libfreetype + a font path) that's out of scope for the quick surface.

Real-World Recipes

Recipe 1: YouTube thumbnail workflow

You have a finished 12-minute video and need a custom thumbnail. Goal: pick the best frame, scale to YouTube's spec (1280×720), and check the contact sheet to confirm the choice.

bash

# Step 1: contact sheet to pick the best moment (time-based sampling)
npx fqmpeg tile video.mp4 --cols 6 --rows 8 --width 480
# → video-tile6x8.jpg, 48 frames spanning first 48 seconds

# For longer coverage of a 12-minute (720-second) video, you'd need cols×rows ≥ 720,
# so a 30×24 grid (720 tiles). At width 200 that's a 6000×~3375 px sheet.
npx fqmpeg tile video.mp4 --cols 30 --rows 24 --width 200

# Step 2: extract the chosen frame at its exact timestamp
npx fqmpeg thumbnail video.mp4 -s 374 -o thumbnail-raw.jpg

# Step 3: resize to YouTube's recommended thumbnail spec
npx fqmpeg resize thumbnail-raw.jpg 1280x720 -o thumbnail-final.jpg

The tile step is the slow part — it has to decode through the source. Once you've picked your timestamp, thumbnail is near-instant thanks to keyframe-aligned -ss.

Recipe 2: Time-lapse from a security camera dump

You have a folder of 86,400 JPEGs from a security camera (one per second, 24 hours of footage). You want a 60-second time-lapse video at 30 fps.

bash

# 86400 input frames at 30 fps output = 86400/30 = 2880 seconds = 48 minutes.
# To compress to 60 seconds at 30 fps, we need 1800 output frames.
# Sample 1 input frame per 48 input frames: 86400 / 1800 = 48.

# Step 1: select every 48th frame to a renamed sequence
ls *.jpg | awk 'NR%48==1' | while read f; do
  printf -v new "frame_%04d.jpg" "$i"
  ln -s "$(realpath "$f")" "$new"
  ((i++))
done

# Step 2: stitch into a 30 fps time-lapse
npx fqmpeg frames-to-video frame_%04d.jpg --fps 30
# → frame-video.mp4

Alternative without symlinks: use slideshow with --duration 0.0333 (1/30 s per image) — but slideshow re-encodes through concat demuxer, which is slower than frames-to-video's direct -i pattern. For datasets this large, frames-to-video is the right tool; slideshow is for ≤ tens of images with per-image durations.

Recipe 3: Before/after filter comparison for portfolio

You're documenting the effect of stabilize on a shaky drone clip. Build a side-by-side comparison video with labels for your portfolio:

bash

# Step 1: stabilize the original
npx fqmpeg stabilize drone-raw.mp4 -o drone-stable.mp4

# Step 2: side-by-side comparison
npx fqmpeg compare drone-raw.mp4 drone-stable.mp4 \
  -o drone-comparison.mp4

# Step 3: extract a single-frame thumbnail for the case study cover image
npx fqmpeg thumbnail drone-comparison.mp4 -s 3 -o drone-cover.jpg

compare itself doesn't add labels — for a polished case study, dry-run the filter and re-render with raw FFmpeg, splicing drawtext into each scale step:

bash

ffmpeg -i drone-raw.mp4 -i drone-stable.mp4 -filter_complex \
  "[0:v]scale=iw/2:ih,drawtext=text='Original':x=20:y=20:fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5[left];[1:v]scale=iw/2:ih,drawtext=text='Stabilized':x=20:y=20:fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5[right];[left][right]hstack" \
  -c:a copy drone-portfolio.mp4

Frequently Asked Questions

Should I use `thumbnail-grid` or `tile`?

Either — they're aliases. Both use the same time-based sampling (select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)') and produce the same contact sheet for a given --cols/--rows/--width. The only difference is the default output filename: thumbnail-grid writes <stem>-grid.jpg, tile writes <stem>-tile<C>x<R>.jpg. Pick tile when the filename should record the grid size; pick thumbnail-grid when a fixed -grid.jpg filename is more convenient. Either way, pass -o to override.

What's the difference between `snapshot` and `video-to-frames`?

Both extract periodic stills, but with different defaults and output locations:

snapshot defaults to one frame per second, JPEG format, output alongside the input (<input-dir>/<stem>-snap-%04d.jpg).
video-to-frames defaults to every frame at the source rate, PNG format, output to the current working directory (./frame_%04d.png).

If you want "occasional reference stills next to the video," snapshot is the right tool. If you want "every frame for ML or editing," video-to-frames is the right tool. Pass -o if you want either to write somewhere else.

Why is `count-frames` slow on long videos?

Because -count_frames decodes the entire video stream. The flag tells ffprobe to actually walk every packet and count successfully decoded frames — necessary for exact accuracy (especially on variable-frame-rate files), but it does cost a full decode pass. For a 4-hour 4K source, that can take minutes. If an estimate is fine, query duration and r_frame_rate from the header and multiply — no decoding required:

bash

ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4

Can I use `frames-to-video` with a glob like `img_*.jpg`?

Yes, but quote it. FFmpeg's image2 demuxer accepts both printf patterns (frame_%04d.png) and shell globs ('img_*.jpg'). Unquoted globs get expanded by the shell before FFmpeg sees them, which usually breaks the command — quote with single quotes so the literal pattern reaches FFmpeg. Glob mode requires uniform extension across all files (no mixed .jpg + .png).

Why does `tile` only cover the first 16 seconds of my long video?

Because the default 4×4 = 16 tiles, combined with the 1-second time-based sampling, naturally covers exactly 16 seconds. To span a longer video, increase --cols and --rows so cols × rows ≥ duration_in_seconds. For a 5-minute (300-second) video evenly sampled, you'd need a roughly 17×18 grid (306 tiles). For coarser sampling on long videos, drop to raw FFmpeg and adjust the select predicate — e.g. select='gte(t - prev_selected_t, 60)' for one frame per minute.

How is `frames-to-video` different from `slideshow`?

frames-to-video consumes a numbered or globbed image sequence (frame_%04d.png) and produces a video at a single frame rate — every input image becomes one frame, uniformly. slideshow consumes an explicit list of images (img1.jpg img2.jpg img3.jpg) and lets you set a per-image duration (each image held for N seconds), with hard cuts between images (no built-in crossfade — see the slideshow section for how to add fades via raw FFmpeg xfade). Use frames-to-video for time-lapses, ML output reconstruction, and image sequences from rendering. Use slideshow for photo presentations where each image needs to linger for a few seconds.

What threshold should I use with `scenes` for a typical narrative video?

Start with the default 0.3. If you're getting too few cuts (missing hard transitions), drop to 0.2 or 0.15. If you're getting too many cuts (false positives on camera motion, dissolves), raise to 0.4 or 0.5. Threshold sensitivity depends on content — music videos with fast motion often need higher thresholds (0.5+) to ignore in-shot movement, while talking-head interviews can use lower (0.2) because the only changes are real cuts.

Can `compare` handle audio from both inputs simultaneously?

No. The implementation uses -c:a copy which takes only input 1's audio stream. Input 2's audio is discarded. This matches the usual workflow (before/after visual comparison with one common audio track). For dual-audio comparison (e.g. comparing two different audio mixes side by side), use raw FFmpeg with amerge or amix:

bash

ffmpeg -i a.mp4 -i b.mp4 -filter_complex \
  "[0:v]scale=iw/2:ih[L];[1:v]scale=iw/2:ih[R];[L][R]hstack;[0:a][1:a]amerge=inputs=2[a]" \
  -map "[a]" -ac 2 compare-dual-audio.mp4

Wrapping Up

The twelve C12 verbs cover the round-trip between video and individual frames:

thumbnail, snapshot, video-to-frames, count-frames for single or periodic stills (-q:v 2 is the JPEG quality default; count-frames is the only ffprobe verb in the cluster)
thumbnail-grid, thumbnail-strip, tile for contact sheets and filmstrips (all three use time-based sampling — one frame per second — with --cols × --rows (grid/tile) or --frames (strip) as the cap)
frames-to-video, slideshow for rebuilding video from images (frames-to-video for uniform-rate sequences, slideshow for per-image durations with hard cuts — add fades via raw FFmpeg xfade if needed)
scenes, preview, compare for analysis and composition (scenes segments at detected cuts; preview builds an evenly-sampled highlight reel; compare stacks two videos for before/after demos)

Every verb prints its underlying FFmpeg invocation under --dry-run, so when the simplified surface isn't enough (custom contact-sheet sampling, dual-audio compare, frame-accurate seeks instead of keyframe-aligned), copy the filter, edit, and call FFmpeg directly. For the broader fqmpeg map, see the fqmpeg complete guide.

The 12 Verbs at a Glance

Single & Periodic Stills

thumbnail — One frame as a single image

snapshot — Frames at regular intervals

video-to-frames — Every frame (or fps-throttled) as images

count-frames — Total frame count via ffprobe

Contact Sheets & Filmstrips

thumbnail-grid — Contact sheet (alias of tile)

thumbnail-strip — Horizontal filmstrip (time-sampled)

tile — Contact sheet (time-sampled)

Frames ↔ Video Reconstruction

frames-to-video — Image sequence → video

slideshow — Multiple images → video with per-image durations

Analysis & Composition

scenes — Split a video at detected scene cuts

preview — Generate a short highlight reel

compare — Side-by-side before/after

Real-World Recipes

Recipe 1: YouTube thumbnail workflow

Recipe 2: Time-lapse from a security camera dump

Recipe 3: Before/after filter comparison for portfolio

Frequently Asked Questions

Should I use thumbnail-grid or tile?

What's the difference between snapshot and video-to-frames?

Why is count-frames slow on long videos?

Can I use frames-to-video with a glob like img_*.jpg?

Why does tile only cover the first 16 seconds of my long video?

How is frames-to-video different from slideshow?

What threshold should I use with scenes for a typical narrative video?

Can compare handle audio from both inputs simultaneously?

Wrapping Up

`thumbnail` — One frame as a single image

`snapshot` — Frames at regular intervals

`video-to-frames` — Every frame (or fps-throttled) as images

`count-frames` — Total frame count via `ffprobe`

`thumbnail-grid` — Contact sheet (alias of `tile`)

`thumbnail-strip` — Horizontal filmstrip (time-sampled)

`tile` — Contact sheet (time-sampled)

`frames-to-video` — Image sequence → video

`slideshow` — Multiple images → video with per-image durations

`scenes` — Split a video at detected scene cuts

`preview` — Generate a short highlight reel

`compare` — Side-by-side before/after

Should I use `thumbnail-grid` or `tile`?

What's the difference between `snapshot` and `video-to-frames`?

Why is `count-frames` slow on long videos?

Can I use `frames-to-video` with a glob like `img_*.jpg`?

Why does `tile` only cover the first 16 seconds of my long video?

How is `frames-to-video` different from `slideshow`?

What threshold should I use with `scenes` for a typical narrative video?

Can `compare` handle audio from both inputs simultaneously?