tar, gzip, and zip: The Complete Archive & Compression Guide

Q: What's the difference between .tar.gz and .tgz?

They're the same format. `.tgz` is a shorthand that exists because old DOS and Windows systems couldn't handle double extensions. You can use either name — tar doesn't care.

Q: Can I extract a single file from a .tar.gz without decompressing everything?

You can specify the file path during extraction (`tar xzf archive.tar.gz path/to/file`), but tar still has to decompress the entire stream up to that point. For true random access, use `zip` instead.

Q: How do I see progress while compressing a large archive?

Use [`pv` (pipe viewer)](https://www.ivarch.com/programs/pv.shtml) between tar and the compressor: `tar cf - bigdir/ | pv | gzip > archive.tar.gz`. This shows a progress bar with speed and ETA.

Q: How do I compress an entire disk or partition?

Use `dd` piped into your compressor: `dd if=/dev/sda bs=4M | zstd -T0 > disk-image.zst`. For filesystem-level backups, tar with `--one-file-system` is safer: `tar --one-file-system -czf root-backup.tar.gz /`.

Q: What does the -p flag do in tar?

The `-p` (preserve permissions) flag keeps the original file permissions, ownership, and timestamps when extracting. GNU tar preserves permissions by default when run as root, but you should use `-p` explicitly when restoring backups to ensure nothing gets modified.

Q: How do I verify an archive is not corrupted?

Use `tar tzf archive.tar.gz > /dev/null` — this reads the entire archive and will error on corruption. For gzip files specifically, `gzip -t file.gz` runs an integrity check. Zstd has `zstd -t file.zst`.

Q: How do I verify an archive is not corrupted?

Use `tar tzf archive.tar.gz > /dev/null` — this reads the entire archive and will error on corruption. For gzip files specifically, `gzip -t file.gz` runs an integrity check. Zstd has `zstd -t file.zst`.

tar bundles files into a single archive. gzip, bzip2, xz, and zstd compress that archive (or any single file) to save space. zip does both in one step. Understanding which tool to reach for — and when — saves you from bloated backups, painfully slow transfers, and "works on my machine" headaches.

In short: use tar czf archive.tar.gz dir/ to create a gzip-compressed archive, and tar xf archive.tar.gz to extract it. For better speed-to-ratio balance, switch to zstd with tar --zstd -cf archive.tar.zst dir/. Use zip only when sharing with Windows or macOS users who don't have tar.

How Archiving and Compression Actually Work

This distinction tripped me up when I first started using Linux, so let me spell it out: archiving and compression are two separate operations.

Archiving means combining multiple files and directories into a single file. That's what tar does — the name literally stands for "tape archive." The result is a .tar file (often called a "tarball"), and it's the exact same size as the original files combined. No space savings.

Compression means reducing the size of data using an algorithm. Tools like gzip, bzip2, xz, and zstd each implement a different compression algorithm. They only work on a single input stream — one file in, one compressed file out.

The Unix philosophy combines them: tar creates one stream from many files, then a compressor shrinks that stream:

bash

# Two-step approach (you'll rarely do this manually)
tar cf project.tar project/
gzip project.tar
# Result: project.tar.gz

# One-step approach (this is what you'll actually use)
tar czf project.tar.gz project/

Why does this matter? Because zip takes a fundamentally different approach — it compresses each file individually, then bundles them. When tar.gz compresses all files as one stream, it can exploit similarities across files. That's why tar.gz archives are typically 5–15% smaller than equivalent .zip files, especially on codebases with many similar source files.

tar approach:   [file1][file2][file3] → tar → [one stream] → gzip → .tar.gz
zip approach:   [file1→gz][file2→gz][file3→gz] → bundle → .zip

The flip side: you can extract a single file from a .zip without decompressing the whole thing, while .tar.gz requires decompressing the entire stream first.

tar Essentials: Create, Extract, and List Archives

The core flags you'll use every day:

Flag	Meaning
`c`	Create a new archive
`x`	Extract from an archive
`t`	List contents without extracting
`f`	Specify the filename (almost always required)
`v`	Verbose — print each file as it's processed
`z`	Compress/decompress with gzip
`j`	Compress/decompress with bjip2
`J`	Compress/decompress with xJ (uppercase)

Creating archives

bash

# Basic tarball (no compression)
tar cf backup.tar documents/

# gzip-compressed archive
tar czf backup.tar.gz documents/

# xz-compressed archive (smallest size, slowest)
tar cJf backup.tar.xz documents/

# zstd-compressed archive (fast + good compression)
tar --zstd -cf backup.tar.zst documents/

Extracting archives

GNU tar auto-detects the compression format since version 1.15, so you don't need to specify -z, -j, or -J when extracting:

bash

# Extract any compressed tar archive (GNU tar auto-detects)
tar xf backup.tar.gz
tar xf backup.tar.xz
tar xf backup.tar.zst

# Extract to a specific directory
tar xf backup.tar.gz -C /opt/restore/

# Extract a single file
tar xf backup.tar.gz documents/report.pdf

Listing contents

Always check what's inside before extracting — especially archives from the internet. I've seen tarballs that extract files directly into the current directory instead of a subdirectory, overwriting whatever's there:

bash

# List all files in the archive
tar tf backup.tar.gz

# Show detailed info (permissions, sizes, dates)
tar tvf backup.tar.gz

# Check if the archive has a top-level directory
tar tf backup.tar.gz | head -5

Excluding files

This is where tar really shines for project backups. When I'm archiving a Node.js project, the last thing I want is node_modules in my tarball:

bash

# Exclude specific directories
tar czf 32blog-backup.tar.gz \
  --exclude='node_modules' \
  --exclude='.next' \
  --exclude='.git' \
  32blog/

# Exclude by pattern
tar czf source-only.tar.gz \
  --exclude='*.log' \
  --exclude='*.tmp' \
  project/

Adding files to an existing archive

You can append to uncompressed .tar archives (not compressed ones):

bash

# Add a file to an existing tar
tar rf backup.tar newfile.txt

# This does NOT work with .tar.gz — you'll get an error
tar rf backup.tar.gz newfile.txt  # Error!

Choosing a Compression Algorithm: gzip, bzip2, xz, and zstd

Each algorithm makes different trade-offs between speed and compression ratio. Here's what you need to know:

gzip — The universal default

GNU gzip has been the default compression tool on Unix systems for over 30 years. The gzip format is defined in RFC 1952 and is a de facto industry standard. Version 1.14 (February 2025) added faster decompression on x86-64 CPUs using PCLMUL instructions, with up to 40% faster decompression.

bash

# Compress a single file (replaces original)
gzip access.log
# Result: access.log.gz

# Keep the original file
gzip -k access.log

# Decompress
gunzip access.log.gz
# or
gzip -d access.log.gz

# Adjust compression level (1=fastest, 9=smallest, default=6)
gzip -9 access.log

When to use: default choice for scripts, log rotation, and anywhere compatibility matters. Every Unix system has gzip.

bzip2 — Better ratio, much slower

bash

tar cjf archive.tar.bz2 project/

Bzip2 compresses about 10–15% better than gzip but runs significantly slower. It's been largely superseded by xz (better ratio) and zstd (better speed). You'll still encounter .tar.bz2 files in the wild, but there's little reason to create new ones.

xz — Maximum compression

XZ Utils consistently produces the smallest archives. The Linux kernel tarballs, Debian packages, and many open-source projects distribute as .tar.xz:

bash

# Compress with xz
tar cJf archive.tar.xz project/

# Multi-threaded compression (xz is single-threaded by default)
tar -cf - project/ | xz -T0 > archive.tar.xz

# Decompress
tar xf archive.tar.xz

When to use: distributing software, long-term archival where storage matters more than time.

zstd — The modern choice

Zstandard (zstd) was developed at Facebook (now Meta) and has quickly become the go-to compression tool. Version 1.5.7 (February 2025) defaults to multi-threaded compression, capped at 4 threads.

bash

# Create a zstd-compressed archive
tar --zstd -cf archive.tar.zst project/

# Standalone zstd compression
zstd -T0 large-file.sql
# Result: large-file.sql.zst

# Adjust level (1-19 default range, --ultra for 20-22)
zstd -19 large-file.sql
zstd --ultra -22 large-file.sql  # Maximum compression

# Decompress
zstd -d large-file.sql.zst
unzstd large-file.sql.zst

In general benchmarks (Phoronix comparisons, Squash Compression Benchmark), zstd at default settings matches or beats gzip's compression ratio at 3–5× the speed. It's one of those "why didn't I switch sooner" tools — a common sentiment on Reddit and HN threads. The only downside is that older systems (pre-2019) might not have zstd installed.

When to use: any new project where you control both ends. Backups, CI/CD artifacts, database dumps, container images.

zip and unzip: The Cross-Platform Option

zip is the format that works everywhere — Windows, macOS, and Linux all handle it natively. Unlike the tar + compressor approach, zip archives and compresses in a single step:

bash

# Create a zip archive
zip -r project.zip project/

# Add compression level (0=store, 9=maximum)
zip -r -9 project.zip project/

# Exclude patterns
zip -r project.zip project/ -x "*.git*" "*/node_modules/*"

# List contents
unzip -l project.zip

# Extract
unzip project.zip

# Extract to specific directory
unzip project.zip -d /opt/restore/

# Extract a single file
unzip project.zip "project/config.json"

Password-protected archives

bash

# Create encrypted zip (prompts for password)
zip -er sensitive.zip contracts/

# Extract (prompts for password)
unzip sensitive.zip

When to use zip over tar.gz

Sharing files with Windows or macOS users who won't touch a terminal
Java .jar and .war files (they're actually zip archives)
Email attachments (zip is universally understood)
When you need random access to individual files without decompressing everything

For everything else — backups, deployments, data archival — stick with tar + your compressor of choice.

Real-World Patterns: Backups, Deployments, and Pipelines

Timestamped backups

A pattern I use for 32blog's content directory — each backup gets a timestamp so they don't overwrite each other:

bash

#!/usr/bin/env bash
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/backups"
SOURCE_DIR="/var/www/32blog"

tar --zstd -cf "${BACKUP_DIR}/32blog-${TIMESTAMP}.tar.zst" \
  --exclude='node_modules' \
  --exclude='.next' \
  --exclude='.git' \
  "${SOURCE_DIR}"

# Clean up backups older than 30 days
find "${BACKUP_DIR}" -name "32blog-*.tar.zst" -mtime +30 -delete

echo "Backup created: 32blog-${TIMESTAMP}.tar.zst"

Remote transfer via SSH (no intermediate file)

This is one of my favorite tar patterns — piping directly through SSH to transfer a directory to another server without creating a temporary archive. You can find more SSH patterns in the ssh and rsync guide:

bash

# Copy a directory to a remote server via SSH
tar czf - project/ | ssh user@server "tar xzf - -C /opt/deploy/"

# Copy from remote to local
ssh user@server "tar czf - /var/log/app/" | tar xzf - -C ./logs/

# With progress indicator (requires pv)
tar cf - project/ | pv | ssh user@server "tar xf - -C /opt/deploy/"

Splitting large archives

When you need to transfer a massive archive but have size limits (email, USB drives, or unreliable network connections):

bash

# Create and split into 100MB chunks
tar czf - large-project/ | split -b 100M - backup-part-

# Reassemble and extract
cat backup-part-* | tar xzf -

Selective extraction with find

Combine tar with find and xargs for powerful workflows:

bash

# Find all .tar.gz files and extract them
find /downloads -name "*.tar.gz" -exec tar xzf {} -C /opt/extracted/ \;

# Archive only files modified in the last 24 hours
find project/ -mtime -1 -type f -print0 | \
  tar czf recent-changes.tar.gz --null -T -

# Archive only specific file types
find src/ -name "*.ts" -o -name "*.tsx" | \
  tar czf typescript-source.tar.gz -T -

Incremental backups with tar

GNU tar supports incremental backups using snapshot files. The first run creates a full backup; subsequent runs back up only changed files:

bash

# Full backup (creates snapshot file)
tar --listed-incremental=snapshot.snar \
  -czf backup-full.tar.gz project/

# Incremental backup (only files changed since last run)
tar --listed-incremental=snapshot.snar \
  -czf backup-incr-$(date +%Y%m%d).tar.gz project/

# Restore: apply full backup first, then incrementals in order
tar --listed-incremental=/dev/null -xzf backup-full.tar.gz
tar --listed-incremental=/dev/null -xzf backup-incr-20260323.tar.gz

Performance Comparison: Speed vs. Compression Ratio

Here's how these algorithms compare on a typical web project (~200MB of source code, content, and assets), based on general benchmarks (Squash Compression Benchmark, Phoronix tests):

Algorithm	Command flag	Compressed size	Compression speed	Decompression speed
gzip -1	`tar czf` (fast)	~45MB	Very fast	Fast
gzip -6	`tar czf` (default)	~38MB	Fast	Fast
gzip -9	`tar czf` (best)	~37MB	Moderate	Fast
bzip2	`tar cjf`	~33MB	Slow	Slow
xz -6	`tar cJf`	~28MB	Very slow	Moderate
zstd -3	`tar --zstd` (default)	~36MB	Very fast	Very fast
zstd -19	`tar --zstd` (high)	~29MB	Slow	Very fast
zip -6	`zip -r` (default)	~40MB	Fast	Fast

Key takeaways:

zstd default gets close to gzip's compression with 3–5× faster speed
xz wins on size but you'll feel the compression time on large datasets
zstd decompression is consistently the fastest regardless of compression level
zip always loses on ratio because it compresses files individually

For most day-to-day work, zstd at default settings is the best all-around choice. Use xz when you're creating archives that will be downloaded thousands of times (the extra compression time pays for itself in bandwidth). Use gzip when you need guaranteed compatibility with any system.

FAQ

What's the difference between .tar.gz and .tgz?

They're the same format. .tgz is a shorthand that exists because old DOS and Windows systems couldn't handle double extensions. You can use either name — tar doesn't care.

Can I extract a single file from a .tar.gz without decompressing everything?

You can specify the file path during extraction (tar xzf archive.tar.gz path/to/file), but tar still has to decompress the entire stream up to that point. For true random access, use zip instead.

How do I see progress while compressing a large archive?

Use pv (pipe viewer) between tar and the compressor: tar cf - bigdir/ | pv | gzip > archive.tar.gz. This shows a progress bar with speed and ETA.

Can tar handle files larger than 4GB?

Yes. GNU tar uses POSIX extended headers and has no practical file size limit. Old zip implementations (before Zip64) couldn't handle files over 4GB — another reason to prefer tar for large archives.

Should I use gzip or zstd for new projects?

Use zstd if you control both the compression and decompression environments. It's faster and compresses better at comparable speed settings. Fall back to gzip only when you need compatibility with systems that might not have zstd installed (pre-2019 distributions, embedded systems, minimal Docker images).

How do I compress an entire disk or partition?

Use dd piped into your compressor: dd if=/dev/sda bs=4M | zstd -T0 > disk-image.zst. For filesystem-level backups, tar with --one-file-system is safer: tar --one-file-system -czf root-backup.tar.gz /.

What does the -p flag do in tar?

The -p (preserve permissions) flag keeps the original file permissions, ownership, and timestamps when extracting. GNU tar preserves permissions by default when run as root, but you should use -p explicitly when restoring backups to ensure nothing gets modified.

How do I verify an archive is not corrupted?

Use tar tzf archive.tar.gz > /dev/null — this reads the entire archive and will error on corruption. For gzip files specifically, gzip -t file.gz runs an integrity check. Zstd has zstd -t file.zst.

Wrapping Up

The core pattern is simple: tar bundles, compressors shrink, and you almost always want both. For day-to-day work, tar --zstd -cf and tar xf cover 90% of what you need. Reach for zip when cross-platform sharing matters, and xz when you're distributing software and every kilobyte counts.

The commands that'll serve you best:

bash

# Create (zstd, modern systems)
tar --zstd -cf archive.tar.zst directory/

# Create (gzip, universal compatibility)
tar czf archive.tar.gz directory/

# Extract (any format, auto-detected)
tar xf archive.tar.gz

# List contents before extracting
tar tf archive.tar.gz

# Cross-platform sharing
zip -r archive.zip directory/

For more CLI tools that pair well with tar, check the CLI tools map. The find guide covers file selection patterns, and the ssh and rsync guide has more on remote transfer workflows.