32blogby Studio Mitsu

fqmpeg Thumbnails, Frame Extraction & Slideshows: 12 Verbs Explained

Twelve fqmpeg verbs for thumbnails, contact sheets, frame extraction, scene splitting, slideshows, and side-by-side comparison — source-verified defaults, dry-run output, and the differences between near-identical verbs.

by omitsu27 min read
FFmpegfqmpegCLIthumbnails
On this page

fqmpeg's C12 cluster covers the back-and-forth between video and individual frames — single thumbnails, periodic snapshots, contact sheets, filmstrips, scene-detection splits, slideshows from stills, side-by-side comparisons, and counting frames with ffprobe. Twelve verbs total, all working through ffmpeg (or ffprobe in one case) but exposing far simpler arguments than the underlying filter chains.

This guide walks each verb against its source in src/commands/ of fqmpeg 3.0.3 — the underlying FFmpeg filter or flag, the defaults, the output filename, and the gotchas that aren't visible from --help alone (thumbnail-grid and tile both produce contact sheets but sample differently; snapshot and video-to-frames both extract periodic stills but to different default directories; frames-to-video and slideshow both build video from images but use entirely different mechanisms).

What you'll get out of this guide

  • A decision matrix for the 12 verbs by task (single still / multiple stills / contact sheets / video reconstruction / analysis)
  • Exact FFmpeg invocation each verb generates (verified --dry-run output)
  • Defaults, units, output filenames — and which verbs overlap (thumbnail-grid vs tile, snapshot vs video-to-frames)
  • Three recipes — YouTube thumbnail workflow, time-lapse from a security camera dump, before/after filter comparison

The 12 Verbs at a Glance

The cluster splits into four task groups. Pick the group, then the verb.

GroupVerbsWhat they do
Single & periodic stillsthumbnail, snapshot, video-to-frames, count-framesPull one image at a timestamp, or many at intervals / per frame
Contact sheets & filmstripsthumbnail-grid, thumbnail-strip, tileComposite multiple frames into a single image (grid or row)
Frames ↔ video reconstructionframes-to-video, slideshowBuild video from a numbered pattern or a list of images
Analysis & compositionscenes, preview, compareDetect cuts, generate highlight reels, side-by-side before/after

Five things to know before reading on:

  1. thumbnail-grid and tile build contact sheets with the same time-based sampling. Both use select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)' — one frame per second — which behaves predictably on long and short videos alike. The two verbs are kept as aliases for discoverability (search for "grid" or "tile" and either lands here); the only difference is the default output filename (-grid.jpg vs -tile<C>x<R>.jpg).
  2. snapshot outputs alongside the input; video-to-frames outputs to the current working directory. Both extract periodic stills, but their default output patterns differ — snapshot puts files in the same folder as the input (<input-dir>/<stem>-snap-%04d.jpg), while video-to-frames writes to ./frame_%04d.png from cwd. If you want predictable paths, always pass -o.
  3. count-frames is the only verb in C12 that uses ffprobe, not ffmpeg. It runs ffprobe -count_frames -select_streams v:0 and prints a single integer to stdout. No file is written.
  4. frames-to-video uses -framerate (input rate), not -r (output rate). This matters because -framerate tells FFmpeg how to interpret the still sequence (how many to feed per second), while -r would re-time the output stream. For straight image-to-video at a consistent rate, -framerate is the correct flag and what fqmpeg uses.
  5. scenes splits a video at detected cuts using the segmenter. Each scene becomes its own file (<stem>-scene000.mp4, <stem>-scene001.mp4, ...) — useful for breaking long recordings into clips automatically. Threshold is 0.01.0; lower = more sensitive = more cuts.

Single & Periodic Stills

thumbnail — One frame as a single image

The simplest verb in the cluster: seek to a timestamp and write one JPEG. Used for video thumbnails (YouTube uploads, gallery covers, OG image tags).

Argument / OptionDefaultNotes
<input>requiredInput video file
-s, --start <sec>1Timestamp in seconds (decimal allowed)
-o, --output <path><input-stem>-thumb.jpgOverride output
bash
$ npx fqmpeg thumbnail input.mp4 --dry-run

  ffmpeg -ss 1 -i input.mp4 -frames:v 1 -q:v 2 input-thumb.jpg
bash
$ npx fqmpeg thumbnail input.mp4 -s 45.5 -o cover.jpg --dry-run

  ffmpeg -ss 45.5 -i input.mp4 -frames:v 1 -q:v 2 cover.jpg

-ss before -i is the fast-seek form. FFmpeg uses keyframe-aligned seeking when -ss precedes -i, which is dramatically faster on long files but lands on the nearest keyframe rather than the exact requested timestamp. For a content thumbnail, this is fine — keyframes are usually a half-second apart at most. For frame-accurate extraction, you'd put -ss after -i, but that decodes from the start and is much slower (and fqmpeg doesn't expose that mode here).

-q:v 2: JPEG quality scale where 2 is near-lossless (the scale is 1–31, lower is better). This is the highest reasonable JPEG quality for a thumbnail — large file, sharp output. For smaller thumbnails (gallery tiles), generate at full quality here and downscale separately with resize.

snapshot — Frames at regular intervals

Extracts one frame every N seconds across the entire video. The output is a numbered sequence (<stem>-snap-0001.jpg, 0002.jpg, ...), saved alongside the input file.

Argument / OptionDefaultAllowedNotes
<input>requiredInput video file
--interval <seconds>1positive numberCapture interval (so --interval 5 = one frame per 5 s)
--format <fmt>jpgjpg, pngOutput image format
-o, --output <pattern><input-dir>/<stem>-snap-%04d.<format>printf patternOverride; must contain %d or %0Nd
bash
$ npx fqmpeg snapshot lecture.mp4 --dry-run

  ffmpeg -i lecture.mp4 -vf fps=1 -q:v 2 lecture-snap-%04d.jpg
bash
$ npx fqmpeg snapshot lecture.mp4 --interval 30 --format png --dry-run

  ffmpeg -i lecture.mp4 -vf fps=0.03333333333333333 -q:v 2 lecture-snap-%04d.png

Why fps= instead of select=: the fps filter is the simplest way to enforce a constant output rate — FFmpeg drops or duplicates frames as needed to match the target. fps=0.0333... means "one frame every 30 seconds", and FFmpeg picks whichever frame is closest to each 30-second tick.

-q:v 2 is applied to PNG too, but ignored — PNG quality is determined by compression level, not q scale. JPEG is where it matters.

Counting outputs: for a 60-minute video at --interval 30, you'll get 120 files (-snap-0001.jpg through -snap-0120.jpg). For --interval 1 on the same file, you'd get 3600 files. Plan disk space accordingly.

video-to-frames — Every frame (or fps-throttled) as images

Extracts frames at the source frame rate by default, or at a throttled rate if --fps is supplied. Use this when you need every frame (frame-by-frame retouching, ML training datasets) or a known per-second sampling rate.

Argument / OptionDefaultAllowedNotes
<input>requiredInput video
--fps <n>source rate (no filter)positive numberThrottle to this many frames per second
--format <fmt>pngpng, jpgImage format. PNG default (lossless) for ML / editing
-o, --output <pattern>./frame_%04d.<format>printf patternOverride
bash
$ npx fqmpeg video-to-frames input.mp4 --dry-run

  ffmpeg -i input.mp4 frame_%04d.png
bash
$ npx fqmpeg video-to-frames input.mp4 --fps 5 --format jpg --dry-run

  ffmpeg -i input.mp4 -vf fps=5 frame_%04d.jpg

Default is every frame. At 30 fps for 60 s, that's 1800 PNGs — easily several hundred megabytes. If you just want periodic samples, use snapshot (which has a saner default) or pass --fps.

Output goes to CWD, not the input directory. This is intentional — frame dumps are often scratch data that you want in a known working folder, not polluting the directory the video lives in. But it's a behavioral inconsistency with snapshot, so if you want them next to the input, pass -o input/dir/frame_%04d.png.

PNG vs JPG: PNG is the safe default for downstream image processing (lossless, alpha-preserving). JPG is 5–10× smaller but lossy — fine for previewing or for ML training where the model will downsample anyway. The format flag drives the extension; FFmpeg picks the codec from that.

count-frames — Total frame count via ffprobe

Returns the exact number of video frames in the stream. The only C12 verb that doesn't use ffmpeg — it runs ffprobe -count_frames and writes a single integer to stdout, no output file.

Argument / OptionNotes
<input>Input video
bash
$ npx fqmpeg count-frames input.mp4 --dry-run

  ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 input.mp4
bash
$ npx fqmpeg count-frames input.mp4

1798

-count_frames decodes the entire stream. It's accurate but slow — for a long file, count-frames reads every byte of the video. For a faster estimate, query duration × frame rate from the header (no decoding required):

bash
ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4

That returns duration and r_frame_rate (e.g. 60.5,30000/1001), and you multiply (60.5 × 29.97 ≈ 1814 frames) — close but not exact for variable-frame-rate files.

Why exact counts matter: for video editing automation (split into N equal chunks of frames), pose-estimation workflows that need per-frame indexing, or anti-cheat / forensic analysis where the exact frame count is a fingerprint. For most needs, the header-derived estimate is fine.

Contact Sheets & Filmstrips

thumbnail-grid — Contact sheet (alias of tile)

A multi-frame thumbnail composite arranged as a grid. The classic "DVD chapter selection" look. Uses the same time-based sampling as tile (one frame per second), kept under this name for users who search for "grid".

  • Source: src/commands/thumbnail-grid.js
  • Filter: select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=<W>:-1, tile=<C>x<R>
  • Output: <input-stem>-grid.jpg
Argument / OptionDefaultNotes
<input>requiredInput video
--cols <n>4Grid columns
--rows <n>4Grid rows
--width <n>320Width of each tile in pixels
-o, --output <path><input-stem>-grid.jpgOverride
bash
$ npx fqmpeg thumbnail-grid input.mp4 --dry-run

  ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=320:-1,tile=4x4 -q:v 2 input-grid.jpg

One frame per second, capped at cols × rows. With the default 4×4 = 16 tiles, the output covers the first 16 seconds of the video. For longer videos, increase --cols/--rows so cols × rows ≥ duration_in_seconds, or use the tile verb (which has the same behavior) and pass -o to control the filename. See the tile section below for the prev_selected_t mechanics in detail.

Same algorithm as tile, different default output name. Use thumbnail-grid when the word "grid" comes to mind; use tile when "tile" or "contact sheet" comes to mind. The tile default filename embeds the dimensions (-tile4x4.jpg), which is handy when re-running with different --cols/--rows values.

thumbnail-strip — Horizontal filmstrip (time-sampled)

A one-row variant of thumbnail-grid. Same time-based sampling, but rows=1 and a per-frame height parameter (rather than width).

  • Source: src/commands/thumbnail-strip.js
  • Filter: select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=-1:<H>, tile=<N>x1
  • Output: <input-stem>-strip.jpg
Argument / OptionDefaultNotes
<input>requiredInput video
--frames <n>10Number of frames in the strip
--height <n>120Height of each frame in pixels
-o, --output <path><input-stem>-strip.jpgOverride
bash
$ npx fqmpeg thumbnail-strip input.mp4 --dry-run

  ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=-1:120,tile=10x1 -q:v 2 input-strip.jpg

One frame per second, capped at --frames. With the default --frames 10, the strip covers the first 10 seconds of the video. For a strip that spans a longer runtime, increase --frames so it meets or exceeds duration_in_seconds, or build the strip from snapshot output stitched with tile/montage (ImageMagick) at proportional timestamps.

Filmstrip use case: scrubber previews in video editors, video player hover previews (YouTube's WebVTT thumbnail strip), or social-media-style "scrolling timeline" art.

tile — Contact sheet (time-sampled)

Samples one frame per second (timestamp-based via prev_selected_t), so the grid spans the early portion of the video evenly regardless of source frame rate. Identical algorithm to thumbnail-grid; this verb is the one whose default filename (-tile<C>x<R>.jpg) encodes the dimensions.

  • Source: src/commands/tile.js
  • Filter: select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)', scale=<W>:-1, tile=<C>x<R>
  • Output: <input-stem>-tile<C>x<R>.jpg
Argument / OptionDefaultNotes
<input>requiredInput video
--cols <n>4Grid columns
--rows <n>4Grid rows
--width <n>320Width of each tile in pixels
-o, --output <path><input-stem>-tile<C>x<R>.jpgOverride (note the dimensions in the default name)
bash
$ npx fqmpeg tile input.mp4 --dry-run

  ffmpeg -i input.mp4 -frames:v 1 -vf select='isnan(prev_selected_t)+gte(t-prev_selected_t\,1)',scale=320:-1,tile=4x4 -q:v 2 input-tile4x4.jpg

The prev_selected_t predicate:

  • isnan(prev_selected_t) is true for the very first frame (no previous selection yet — prev_selected_t is undefined → NaN).
  • gte(t - prev_selected_t, 1) is true when at least 1 second has passed since the last selected frame.

Their sum gates the select filter, producing one selection per second. For the default 4×4 = 16 tiles, the output covers the first 16 seconds. For longer videos, the tile=4x4 cap drops anything past the 16th selection, giving you the first 16 seconds. If you want a contact sheet that spans the whole runtime, increase --cols/--rows so cols × rows ≥ duration_in_seconds, or drop to raw FFmpeg with select='gte(t, X)' at proportional timestamps.

Why two verbs (tile and thumbnail-grid): discoverability — users search for "grid" or "tile" and either lands here. The filter and sampling are identical; only the default output filename differs (-tile4x4.jpg for tile, -grid.jpg for thumbnail-grid). Pick whichever name comes to mind, and override -o if you need a stable filename across runs.

Frames ↔ Video Reconstruction

frames-to-video — Image sequence → video

The inverse of video-to-frames. Takes a printf-style numbered sequence (or glob pattern) of still images and produces a single video file.

Argument / OptionDefaultNotes
<pattern>requiredprintf pattern (frame_%04d.png) or shell glob (img_*.jpg)
--fps <n>30Frames per second
--codec <name>libx264Video codec
-o, --output <path><pattern-base>-video.mp4Override (default strips %d and extension from pattern)
bash
$ npx fqmpeg frames-to-video frame_%04d.png --dry-run

  ffmpeg -framerate 30 -i frame_%04d.png -c:v libx264 -pix_fmt yuv420p frame-video.mp4
bash
$ npx fqmpeg frames-to-video 'img_*.jpg' --fps 60 --dry-run

  ffmpeg -framerate 60 -i img_*.jpg -c:v libx264 -pix_fmt yuv420p img_*-video.mp4

-framerate vs -r: -framerate sets the input image-sequence rate (how many to consume per second), not the output frame rate. For most image-to-video cases this is what you want — the output frame rate matches by default. If you wanted a different output rate (e.g. interpolate or duplicate frames), you'd add -r after the input.

yuv420p for compatibility: raw image files (especially PNG) decode to yuva420p or rgb24 formats that some players reject in MP4 containers. Forcing yuv420p guarantees playback on QuickTime, mobile browsers, and embedded players. For higher-quality archival output (10-bit, full RGB), pass --codec libx264rgb or drop to raw FFmpeg.

Pattern matching: printf patterns (frame_%04d.png) require sequential numbering starting from 00001 (or 00000 with -start_number 0). Glob patterns (img_*.jpg) work but require the shell to expand them — quote them in scripts to avoid early expansion. Mixed-extension globs (.jpg and .png together) don't work because FFmpeg's image2 demuxer expects uniform format.

slideshow — Multiple images → video with per-image durations

Builds a video from a list of stills where each image is held on screen for a configurable duration. Uses FFmpeg's concat demuxer with an auto-generated listfile (cleaned up on exit).

  • Source: src/commands/slideshow.js
  • Flags: -f concat -safe 0 -i <listfile> -vf fps=<n>,format=yuv420p -c:v libx264 -pix_fmt yuv420p
  • Output: <input-dir>/slideshow.mp4
Argument / OptionDefaultNotes
<images...>required (≥2)Two or more image paths in order
--duration <sec>3Seconds per image (uniform across all)
--fps <n>30Output frame rate
-o, --output <path><input-dir>/slideshow.mp4Override
bash
$ npx fqmpeg slideshow img1.jpg img2.jpg img3.jpg --dry-run

  # Image list (auto-generated):
  # file '/abs/path/img1.jpg'
  # duration 3
  # file '/abs/path/img2.jpg'
  # duration 3
  # file '/abs/path/img3.jpg'
  # duration 3
  # file '/abs/path/img3.jpg'

  ffmpeg -f concat -safe 0 -i imagelist.txt -vf fps=30,format=yuv420p -c:v libx264 -pix_fmt yuv420p slideshow.mp4

concat demuxer with per-entry duration: the listfile syntax file '<path>' \n duration <sec> lets FFmpeg stitch stills end-to-end at exact times. The cleanup is automatic (fqmpeg uses process.on("exit") to delete the timestamped listfile).

Why the last image is listed twice: in the concat demuxer, the duration line specifies the transition time to the next file, so the final image would otherwise display for zero seconds. fqmpeg repeats the last image as a trailing entry (with no duration line) so it actually holds for --duration seconds and the output is images.length × duration seconds long, matching what the option promises.

Absolute paths in the listfile: fqmpeg resolves each image to an absolute path before writing, so the concat works regardless of where you invoke it from. Single-quoted with internal quotes escaped ('\\''), so paths with apostrophes (e.g. O'Brien.jpg) work too.

Single-image edge case: the command requires ≥ 2 images and errors out with one. For a "still image as video" of arbitrary length, use raw FFmpeg: ffmpeg -loop 1 -i img.jpg -t 10 -c:v libx264 -pix_fmt yuv420p out.mp4.

No built-in transitions — hard cuts only. slideshow uses FFmpeg's concat demuxer, which stitches images end-to-end without crossfades. For fades between images, drop to raw FFmpeg's xfade filter (filter_complex chaining each pair with xfade=transition=fade:duration=1:offset=…), or run slideshow then post-process with a separate fade pass. Adding xfade chains as a built-in would replace the simple concat flow with per-image segment plumbing — out of scope for the "quick" surface.

Analysis & Composition

scenes — Split a video at detected scene cuts

Detects scene changes via the scene metadata variable (a per-frame difference score, 0–1) and uses the segmenter to write each scene to its own file.

  • Source: src/commands/scenes.js
  • Filter: select='gt(scene,<threshold>)',setpts=N/FRAME_RATE/TB + -f segment -reset_timestamps 1
  • Output: <input-dir>/<stem>-scene%03d<ext>
Argument / OptionDefaultRangeNotes
<input>requiredInput video
--threshold <n>0.30.01.0Lower = more sensitive = more cuts
-o, --output <pattern><input-dir>/<stem>-scene%03d<ext>printf patternOverride
bash
$ npx fqmpeg scenes movie.mp4 --dry-run

  ffmpeg -i movie.mp4 -filter_complex select='gt(scene,0.3)',setpts=N/FRAME_RATE/TB -f segment -reset_timestamps 1 movie-scene%03d.mp4

scene metadata variable: FFmpeg computes a 0–1 score per frame representing how different it is from the previous frame (color histogram difference). Values above the threshold are treated as cuts. Typical values:

  • 0.10.2: aggressive (catches dissolves and crossfades too, often false positives on motion)
  • 0.3: balanced (the default — most hard cuts in narrative video)
  • 0.40.5: conservative (only sharp transitions; misses some legitimate cuts)
  • 0.7+: only catches the very hardest cuts (chapter boundaries in scripted content)

-reset_timestamps 1: each output segment starts at PTS 0 rather than continuing the source timeline. This is what makes the segments individually playable.

Stream-copy isn't possible. Because the filter touches every frame, the output is re-encoded. For a fast cut list without re-encoding, use raw FFmpeg with -ss/-to after detection — first run scenes --dry-run to read the threshold, then use ffprobe to extract timestamps, then segment via ffmpeg -ss <t1> -to <t2> -c copy.

preview — Generate a short highlight reel

Samples N short clips evenly distributed across the source and concatenates them into a short preview video — the "social media trailer" style.

  • Source: src/commands/preview.js
  • Filter: select N clips of <clip-duration> seconds at evenly spaced positions, concat with -t <clips × clip-duration>
  • Output: <input-stem>-preview.<ext>
Argument / OptionDefaultNotes
<input>requiredInput video
--clips <n>5Number of sample clips
--clip-duration <sec>2Duration of each clip in seconds
-o, --output <path><input-stem>-preview.<ext>Override
bash
$ npx fqmpeg preview input.mp4 --dry-run

  # Note: could not probe input.mp4 for duration. Using placeholder total=60s.
  # Run on a real file (or after creating input.mp4) to get exact clip offsets.

  ffmpeg -i input.mp4 -vf select='between(t,0,2)+between(t,12,14)+between(t,24,26)+between(t,36,38)+between(t,48,50)',setpts=N/FRAME_RATE/TB -af aselect='between(t,0,2)+between(t,12,14)+between(t,24,26)+between(t,36,38)+between(t,48,50)',asetpts=N/SR/TB -t 10 input-preview.mp4

Output length is deterministic: clips × clip-duration seconds, regardless of source length. The defaults (5 × 2 = 10 s) produce a 10-second highlight — short enough for Twitter / Instagram, long enough to convey the gist of a 30-minute talk.

Even distribution across the source: preview runs ffprobe first to read the source duration T, then selects --clips segments at evenly spaced offsets — clip 1 starts at 0 × T/clips, clip 2 at 1 × T/clips, ..., clip N at (N−1) × T/clips. Each runs for clip-duration seconds. For a 60-minute video with the defaults, that's clips at minute 0, 12, 24, 36, 48 — 2 seconds each. The dry-run output above used T = 60 (placeholder) because input.mp4 doesn't exist on disk; the actual offsets are computed from the real file at run time.

No audio fade between clips. The output cuts hard between sampled clips, so audio pops are likely on music-heavy content. For social-media-grade previews, you'd want crossfades between clips and a music bed; for that you're better off scripting the workflow with trim + crossfade + audio-fade manually.

compare — Side-by-side before/after

Stacks two videos horizontally (default) or vertically into a single output for visual comparison. The canonical "look how much better this filter is" demo.

  • Source: src/commands/compare.js
  • Filter (horizontal): [0:v]scale=iw/2:ih[left];[1:v]scale=iw/2:ih[right];[left][right]hstack
  • Filter (vertical): [0:v]scale=iw:ih/2[top];[1:v]scale=iw:ih/2[bottom];[top][bottom]vstack
  • Output: <input1-stem>-compare.<ext>
Argument / OptionDefaultAllowedNotes
<input1>requiredFirst (left / top) video
<input2>requiredSecond (right / bottom) video
--direction <dir>horizontalhorizontal, verticalLayout
-o, --output <path><input1-stem>-compare.<ext>Override
bash
$ npx fqmpeg compare before.mp4 after.mp4 --dry-run

  ffmpeg -i before.mp4 -i after.mp4 -filter_complex [0:v]scale=iw/2:ih[left];[1:v]scale=iw/2:ih[right];[left][right]hstack -c:a copy before-compare.mp4
bash
$ npx fqmpeg compare original.mp4 stabilized.mp4 --direction vertical --dry-run

  ffmpeg -i original.mp4 -i stabilized.mp4 -filter_complex [0:v]scale=iw:ih/2[top];[1:v]scale=iw:ih/2[bottom];[top][bottom]vstack -c:a copy original-compare.mp4

Each input is halved before stacking: the output canvas keeps the original width (horizontal) or height (vertical), with each input scaled to half. This preserves the overall aspect ratio. If the two inputs differ in dimensions, the scale step normalizes them to the same half-size — but content distortion can result. For best output, the two inputs should already match in resolution and duration.

Audio is copied from input 1. -c:a copy takes the first input's audio track stream-copied; input 2's audio is discarded. This matches the usual before/after framing (compare visuals, keep one audio).

Duration mismatch: if the two videos differ in length, the output ends when the shorter one ends (FFmpeg's hstack/vstack default), with the remaining frames of the longer one truncated. Trim both to matching lengths first if you need exact alignment.

No built-in labels. For "Left"/"Right" or "Before"/"After" text overlays, dry-run the filter, then drop to raw FFmpeg and splice drawtext into each scale step — see Recipe 2 below for a concrete template. Adding a --label option would force a fontfile-dependency surface (drawtext needs libfreetype + a font path) that's out of scope for the quick surface.

Real-World Recipes

Recipe 1: YouTube thumbnail workflow

You have a finished 12-minute video and need a custom thumbnail. Goal: pick the best frame, scale to YouTube's spec (1280×720), and check the contact sheet to confirm the choice.

bash
# Step 1: contact sheet to pick the best moment (time-based sampling)
npx fqmpeg tile video.mp4 --cols 6 --rows 8 --width 480
# → video-tile6x8.jpg, 48 frames spanning first 48 seconds

# For longer coverage of a 12-minute (720-second) video, you'd need cols×rows ≥ 720,
# so a 30×24 grid (720 tiles). At width 200 that's a 6000×~3375 px sheet.
npx fqmpeg tile video.mp4 --cols 30 --rows 24 --width 200

# Step 2: extract the chosen frame at its exact timestamp
npx fqmpeg thumbnail video.mp4 -s 374 -o thumbnail-raw.jpg

# Step 3: resize to YouTube's recommended thumbnail spec
npx fqmpeg resize thumbnail-raw.jpg 1280x720 -o thumbnail-final.jpg

The tile step is the slow part — it has to decode through the source. Once you've picked your timestamp, thumbnail is near-instant thanks to keyframe-aligned -ss.

Recipe 2: Time-lapse from a security camera dump

You have a folder of 86,400 JPEGs from a security camera (one per second, 24 hours of footage). You want a 60-second time-lapse video at 30 fps.

bash
# 86400 input frames at 30 fps output = 86400/30 = 2880 seconds = 48 minutes.
# To compress to 60 seconds at 30 fps, we need 1800 output frames.
# Sample 1 input frame per 48 input frames: 86400 / 1800 = 48.

# Step 1: select every 48th frame to a renamed sequence
ls *.jpg | awk 'NR%48==1' | while read f; do
  printf -v new "frame_%04d.jpg" "$i"
  ln -s "$(realpath "$f")" "$new"
  ((i++))
done

# Step 2: stitch into a 30 fps time-lapse
npx fqmpeg frames-to-video frame_%04d.jpg --fps 30
# → frame-video.mp4

Alternative without symlinks: use slideshow with --duration 0.0333 (1/30 s per image) — but slideshow re-encodes through concat demuxer, which is slower than frames-to-video's direct -i pattern. For datasets this large, frames-to-video is the right tool; slideshow is for ≤ tens of images with per-image durations.

Recipe 3: Before/after filter comparison for portfolio

You're documenting the effect of stabilize on a shaky drone clip. Build a side-by-side comparison video with labels for your portfolio:

bash
# Step 1: stabilize the original
npx fqmpeg stabilize drone-raw.mp4 -o drone-stable.mp4

# Step 2: side-by-side comparison
npx fqmpeg compare drone-raw.mp4 drone-stable.mp4 \
  -o drone-comparison.mp4

# Step 3: extract a single-frame thumbnail for the case study cover image
npx fqmpeg thumbnail drone-comparison.mp4 -s 3 -o drone-cover.jpg

compare itself doesn't add labels — for a polished case study, dry-run the filter and re-render with raw FFmpeg, splicing drawtext into each scale step:

bash
ffmpeg -i drone-raw.mp4 -i drone-stable.mp4 -filter_complex \
  "[0:v]scale=iw/2:ih,drawtext=text='Original':x=20:y=20:fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5[left];[1:v]scale=iw/2:ih,drawtext=text='Stabilized':x=20:y=20:fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5[right];[left][right]hstack" \
  -c:a copy drone-portfolio.mp4

Frequently Asked Questions

Should I use thumbnail-grid or tile?

Either — they're aliases. Both use the same time-based sampling (select='isnan(prev_selected_t)+gte(t-prev_selected_t, 1)') and produce the same contact sheet for a given --cols/--rows/--width. The only difference is the default output filename: thumbnail-grid writes <stem>-grid.jpg, tile writes <stem>-tile<C>x<R>.jpg. Pick tile when the filename should record the grid size; pick thumbnail-grid when a fixed -grid.jpg filename is more convenient. Either way, pass -o to override.

What's the difference between snapshot and video-to-frames?

Both extract periodic stills, but with different defaults and output locations:

  • snapshot defaults to one frame per second, JPEG format, output alongside the input (<input-dir>/<stem>-snap-%04d.jpg).
  • video-to-frames defaults to every frame at the source rate, PNG format, output to the current working directory (./frame_%04d.png).

If you want "occasional reference stills next to the video," snapshot is the right tool. If you want "every frame for ML or editing," video-to-frames is the right tool. Pass -o if you want either to write somewhere else.

Why is count-frames slow on long videos?

Because -count_frames decodes the entire video stream. The flag tells ffprobe to actually walk every packet and count successfully decoded frames — necessary for exact accuracy (especially on variable-frame-rate files), but it does cost a full decode pass. For a 4-hour 4K source, that can take minutes. If an estimate is fine, query duration and r_frame_rate from the header and multiply — no decoding required:

bash
ffprobe -v error -select_streams v:0 -show_entries stream=duration,r_frame_rate -of csv=p=0 input.mp4

Can I use frames-to-video with a glob like img_*.jpg?

Yes, but quote it. FFmpeg's image2 demuxer accepts both printf patterns (frame_%04d.png) and shell globs ('img_*.jpg'). Unquoted globs get expanded by the shell before FFmpeg sees them, which usually breaks the command — quote with single quotes so the literal pattern reaches FFmpeg. Glob mode requires uniform extension across all files (no mixed .jpg + .png).

Why does tile only cover the first 16 seconds of my long video?

Because the default 4×4 = 16 tiles, combined with the 1-second time-based sampling, naturally covers exactly 16 seconds. To span a longer video, increase --cols and --rows so cols × rows ≥ duration_in_seconds. For a 5-minute (300-second) video evenly sampled, you'd need a roughly 17×18 grid (306 tiles). For coarser sampling on long videos, drop to raw FFmpeg and adjust the select predicate — e.g. select='gte(t - prev_selected_t, 60)' for one frame per minute.

How is frames-to-video different from slideshow?

frames-to-video consumes a numbered or globbed image sequence (frame_%04d.png) and produces a video at a single frame rate — every input image becomes one frame, uniformly. slideshow consumes an explicit list of images (img1.jpg img2.jpg img3.jpg) and lets you set a per-image duration (each image held for N seconds), with hard cuts between images (no built-in crossfade — see the slideshow section for how to add fades via raw FFmpeg xfade). Use frames-to-video for time-lapses, ML output reconstruction, and image sequences from rendering. Use slideshow for photo presentations where each image needs to linger for a few seconds.

What threshold should I use with scenes for a typical narrative video?

Start with the default 0.3. If you're getting too few cuts (missing hard transitions), drop to 0.2 or 0.15. If you're getting too many cuts (false positives on camera motion, dissolves), raise to 0.4 or 0.5. Threshold sensitivity depends on content — music videos with fast motion often need higher thresholds (0.5+) to ignore in-shot movement, while talking-head interviews can use lower (0.2) because the only changes are real cuts.

Can compare handle audio from both inputs simultaneously?

No. The implementation uses -c:a copy which takes only input 1's audio stream. Input 2's audio is discarded. This matches the usual workflow (before/after visual comparison with one common audio track). For dual-audio comparison (e.g. comparing two different audio mixes side by side), use raw FFmpeg with amerge or amix:

bash
ffmpeg -i a.mp4 -i b.mp4 -filter_complex \
  "[0:v]scale=iw/2:ih[L];[1:v]scale=iw/2:ih[R];[L][R]hstack;[0:a][1:a]amerge=inputs=2[a]" \
  -map "[a]" -ac 2 compare-dual-audio.mp4

Wrapping Up

The twelve C12 verbs cover the round-trip between video and individual frames:

  • thumbnail, snapshot, video-to-frames, count-frames for single or periodic stills (-q:v 2 is the JPEG quality default; count-frames is the only ffprobe verb in the cluster)
  • thumbnail-grid, thumbnail-strip, tile for contact sheets and filmstrips (all three use time-based sampling — one frame per second — with --cols × --rows (grid/tile) or --frames (strip) as the cap)
  • frames-to-video, slideshow for rebuilding video from images (frames-to-video for uniform-rate sequences, slideshow for per-image durations with hard cuts — add fades via raw FFmpeg xfade if needed)
  • scenes, preview, compare for analysis and composition (scenes segments at detected cuts; preview builds an evenly-sampled highlight reel; compare stacks two videos for before/after demos)

Every verb prints its underlying FFmpeg invocation under --dry-run, so when the simplified surface isn't enough (custom contact-sheet sampling, dual-audio compare, frame-accurate seeks instead of keyframe-aligned), copy the filter, edit, and call FFmpeg directly. For the broader fqmpeg map, see the fqmpeg complete guide.