tar bundles files into a single archive. gzip, bzip2, xz, and zstd compress that archive (or any single file) to save space. zip does both in one step. Understanding which tool to reach for — and when — saves you from bloated backups, painfully slow transfers, and "works on my machine" headaches.
In short: use tar czf archive.tar.gz dir/ to create a gzip-compressed archive, and tar xf archive.tar.gz to extract it. For better speed-to-ratio balance, switch to zstd with tar --zstd -cf archive.tar.zst dir/. Use zip only when sharing with Windows or macOS users who don't have tar.
How Archiving and Compression Actually Work
This distinction tripped me up when I first started using Linux, so let me spell it out: archiving and compression are two separate operations.
Archiving means combining multiple files and directories into a single file. That's what tar does — the name literally stands for "tape archive." The result is a .tar file (often called a "tarball"), and it's the exact same size as the original files combined. No space savings.
Compression means reducing the size of data using an algorithm. Tools like gzip, bzip2, xz, and zstd each implement a different compression algorithm. They only work on a single input stream — one file in, one compressed file out.
The Unix philosophy combines them: tar creates one stream from many files, then a compressor shrinks that stream:
# Two-step approach (you'll rarely do this manually)
tar cf project.tar project/
gzip project.tar
# Result: project.tar.gz
# One-step approach (this is what you'll actually use)
tar czf project.tar.gz project/
Why does this matter? Because zip takes a fundamentally different approach — it compresses each file individually, then bundles them. When tar.gz compresses all files as one stream, it can exploit similarities across files. That's why tar.gz archives are typically 5–15% smaller than equivalent .zip files, especially on codebases with many similar source files.
tar approach: [file1][file2][file3] → tar → [one stream] → gzip → .tar.gz
zip approach: [file1→gz][file2→gz][file3→gz] → bundle → .zip
The flip side: you can extract a single file from a .zip without decompressing the whole thing, while .tar.gz requires decompressing the entire stream first.
tar Essentials: Create, Extract, and List Archives
The core flags you'll use every day:
| Flag | Meaning |
|---|---|
c | Create a new archive |
x | Extract from an archive |
t | List contents without extracting |
f | Specify the filename (almost always required) |
v | Verbose — print each file as it's processed |
z | Compress/decompress with gzip |
j | Compress/decompress with bjip2 |
J | Compress/decompress with xJ (uppercase) |
Creating archives
# Basic tarball (no compression)
tar cf backup.tar documents/
# gzip-compressed archive
tar czf backup.tar.gz documents/
# xz-compressed archive (smallest size, slowest)
tar cJf backup.tar.xz documents/
# zstd-compressed archive (fast + good compression)
tar --zstd -cf backup.tar.zst documents/
Extracting archives
GNU tar auto-detects the compression format since version 1.15, so you don't need to specify -z, -j, or -J when extracting:
# Extract any compressed tar archive (GNU tar auto-detects)
tar xf backup.tar.gz
tar xf backup.tar.xz
tar xf backup.tar.zst
# Extract to a specific directory
tar xf backup.tar.gz -C /opt/restore/
# Extract a single file
tar xf backup.tar.gz documents/report.pdf
Listing contents
Always check what's inside before extracting — especially archives from the internet. I've seen tarballs that extract files directly into the current directory instead of a subdirectory, overwriting whatever's there:
# List all files in the archive
tar tf backup.tar.gz
# Show detailed info (permissions, sizes, dates)
tar tvf backup.tar.gz
# Check if the archive has a top-level directory
tar tf backup.tar.gz | head -5
Excluding files
This is where tar really shines for project backups. When I'm archiving a Node.js project, the last thing I want is node_modules in my tarball:
# Exclude specific directories
tar czf 32blog-backup.tar.gz \
--exclude='node_modules' \
--exclude='.next' \
--exclude='.git' \
32blog/
# Exclude by pattern
tar czf source-only.tar.gz \
--exclude='*.log' \
--exclude='*.tmp' \
project/
Adding files to an existing archive
You can append to uncompressed .tar archives (not compressed ones):
# Add a file to an existing tar
tar rf backup.tar newfile.txt
# This does NOT work with .tar.gz — you'll get an error
tar rf backup.tar.gz newfile.txt # Error!
Choosing a Compression Algorithm: gzip, bzip2, xz, and zstd
Each algorithm makes different trade-offs between speed and compression ratio. Here's what you need to know:
gzip — The universal default
GNU gzip has been the default compression tool on Unix systems for over 30 years. Version 1.14 (February 2025) added faster decompression on x86-64 CPUs using PCLMUL instructions.
# Compress a single file (replaces original)
gzip access.log
# Result: access.log.gz
# Keep the original file
gzip -k access.log
# Decompress
gunzip access.log.gz
# or
gzip -d access.log.gz
# Adjust compression level (1=fastest, 9=smallest, default=6)
gzip -9 access.log
When to use: default choice for scripts, log rotation, and anywhere compatibility matters. Every Unix system has gzip.
bzip2 — Better ratio, much slower
tar cjf archive.tar.bz2 project/
Bzip2 compresses about 10–15% better than gzip but runs significantly slower. It's been largely superseded by xz (better ratio) and zstd (better speed). You'll still encounter .tar.bz2 files in the wild, but there's little reason to create new ones.
xz — Maximum compression
XZ Utils consistently produces the smallest archives. The Linux kernel tarballs, Debian packages, and many open-source projects distribute as .tar.xz:
# Compress with xz
tar cJf archive.tar.xz project/
# Multi-threaded compression (xz is single-threaded by default)
tar -cf - project/ | xz -T0 > archive.tar.xz
# Decompress
tar xf archive.tar.xz
When to use: distributing software, long-term archival where storage matters more than time.
zstd — The modern choice
Zstandard (zstd) was developed at Facebook and has quickly become the go-to compression tool. Version 1.5.7 (February 2025) defaults to multi-threaded compression, capped at 4 threads.
# Create a zstd-compressed archive
tar --zstd -cf archive.tar.zst project/
# Standalone zstd compression
zstd -T0 large-file.sql
# Result: large-file.sql.zst
# Adjust level (1-19 default range, --ultra for 20-22)
zstd -19 large-file.sql
zstd --ultra -22 large-file.sql # Maximum compression
# Decompress
zstd -d large-file.sql.zst
unzstd large-file.sql.zst
I switched from gzip to zstd for backing up 32blog's content directory and saw compression time drop by about 60% with slightly better compression ratio. The only downside is that older systems (pre-2019) might not have zstd installed.
When to use: any new project where you control both ends. Backups, CI/CD artifacts, database dumps, container images.
zip and unzip: The Cross-Platform Option
zip is the format that works everywhere — Windows, macOS, and Linux all handle it natively. Unlike the tar + compressor approach, zip archives and compresses in a single step:
# Create a zip archive
zip -r project.zip project/
# Add compression level (0=store, 9=maximum)
zip -r -9 project.zip project/
# Exclude patterns
zip -r project.zip project/ -x "*.git*" "*/node_modules/*"
# List contents
unzip -l project.zip
# Extract
unzip project.zip
# Extract to specific directory
unzip project.zip -d /opt/restore/
# Extract a single file
unzip project.zip "project/config.json"
Password-protected archives
# Create encrypted zip (prompts for password)
zip -er sensitive.zip contracts/
# Extract (prompts for password)
unzip sensitive.zip
When to use zip over tar.gz
- Sharing files with Windows or macOS users who won't touch a terminal
- Java
.jarand.warfiles (they're actually zip archives) - Email attachments (zip is universally understood)
- When you need random access to individual files without decompressing everything
For everything else — backups, deployments, data archival — stick with tar + your compressor of choice.
Real-World Patterns: Backups, Deployments, and Pipelines
Timestamped backups
A pattern I use for 32blog's content directory — each backup gets a timestamp so they don't overwrite each other:
#!/usr/bin/env bash
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/backups"
SOURCE_DIR="/var/www/32blog"
tar --zstd -cf "${BACKUP_DIR}/32blog-${TIMESTAMP}.tar.zst" \
--exclude='node_modules' \
--exclude='.next' \
--exclude='.git' \
"${SOURCE_DIR}"
# Clean up backups older than 30 days
find "${BACKUP_DIR}" -name "32blog-*.tar.zst" -mtime +30 -delete
echo "Backup created: 32blog-${TIMESTAMP}.tar.zst"
Remote transfer via SSH (no intermediate file)
This is one of my favorite tar patterns — piping directly through SSH to transfer a directory to another server without creating a temporary archive. You can find more SSH patterns in the ssh and rsync guide:
# Copy a directory to a remote server via SSH
tar czf - project/ | ssh user@server "tar xzf - -C /opt/deploy/"
# Copy from remote to local
ssh user@server "tar czf - /var/log/app/" | tar xzf - -C ./logs/
# With progress indicator (requires pv)
tar cf - project/ | pv | ssh user@server "tar xf - -C /opt/deploy/"
Splitting large archives
When you need to transfer a massive archive but have size limits (email, USB drives, or unreliable network connections):
# Create and split into 100MB chunks
tar czf - large-project/ | split -b 100M - backup-part-
# Reassemble and extract
cat backup-part-* | tar xzf -
Selective extraction with find
Combine tar with find and xargs for powerful workflows:
# Find all .tar.gz files and extract them
find /downloads -name "*.tar.gz" -exec tar xzf {} -C /opt/extracted/ \;
# Archive only files modified in the last 24 hours
find project/ -mtime -1 -type f -print0 | \
tar czf recent-changes.tar.gz --null -T -
# Archive only specific file types
find src/ -name "*.ts" -o -name "*.tsx" | \
tar czf typescript-source.tar.gz -T -
Incremental backups with tar
GNU tar supports incremental backups using snapshot files. The first run creates a full backup; subsequent runs back up only changed files:
# Full backup (creates snapshot file)
tar --listed-incremental=snapshot.snar \
-czf backup-full.tar.gz project/
# Incremental backup (only files changed since last run)
tar --listed-incremental=snapshot.snar \
-czf backup-incr-$(date +%Y%m%d).tar.gz project/
# Restore: apply full backup first, then incrementals in order
tar --listed-incremental=/dev/null -xzf backup-full.tar.gz
tar --listed-incremental=/dev/null -xzf backup-incr-20260323.tar.gz
Performance Comparison: Speed vs. Compression Ratio
I benchmarked these algorithms on a typical web project (32blog's source — about 180MB of source code, MDX content, and assets):
| Algorithm | Command flag | Compressed size | Compression speed | Decompression speed |
|---|---|---|---|---|
| gzip -1 | tar czf (fast) | ~45MB | Very fast | Fast |
| gzip -6 | tar czf (default) | ~38MB | Fast | Fast |
| gzip -9 | tar czf (best) | ~37MB | Moderate | Fast |
| bzip2 | tar cjf | ~33MB | Slow | Slow |
| xz -6 | tar cJf | ~28MB | Very slow | Moderate |
| zstd -3 | tar --zstd (default) | ~36MB | Very fast | Very fast |
| zstd -19 | tar --zstd (high) | ~29MB | Slow | Very fast |
| zip -6 | zip -r (default) | ~40MB | Fast | Fast |
Key takeaways:
- zstd default gets close to gzip's compression with 3–5× faster speed
- xz wins on size but you'll feel the compression time on large datasets
- zstd decompression is consistently the fastest regardless of compression level
- zip always loses on ratio because it compresses files individually
For most day-to-day work, zstd at default settings is the best all-around choice. Use xz when you're creating archives that will be downloaded thousands of times (the extra compression time pays for itself in bandwidth). Use gzip when you need guaranteed compatibility with any system.
FAQ
What's the difference between .tar.gz and .tgz?
They're the same format. .tgz is a shorthand that exists because old DOS and Windows systems couldn't handle double extensions. You can use either name — tar doesn't care.
Can I extract a single file from a .tar.gz without decompressing everything?
You can specify the file path during extraction (tar xzf archive.tar.gz path/to/file), but tar still has to decompress the entire stream up to that point. For true random access, use zip instead.
How do I see progress while compressing a large archive?
Use pv (pipe viewer) between tar and the compressor: tar cf - bigdir/ | pv | gzip > archive.tar.gz. This shows a progress bar with speed and ETA.
Can tar handle files larger than 4GB?
Yes. GNU tar uses POSIX extended headers and has no practical file size limit. Old zip implementations (before Zip64) couldn't handle files over 4GB — another reason to prefer tar for large archives.
Should I use gzip or zstd for new projects?
Use zstd if you control both the compression and decompression environments. It's faster and compresses better at comparable speed settings. Fall back to gzip only when you need compatibility with systems that might not have zstd installed (pre-2019 distributions, embedded systems, minimal Docker images).
How do I compress an entire disk or partition?
Use dd piped into your compressor: dd if=/dev/sda bs=4M | zstd -T0 > disk-image.zst. For filesystem-level backups, tar with --one-file-system is safer: tar --one-file-system -czf root-backup.tar.gz /.
What does the -p flag do in tar?
The -p (preserve permissions) flag keeps the original file permissions, ownership, and timestamps when extracting. GNU tar preserves permissions by default when run as root, but you should use -p explicitly when restoring backups to ensure nothing gets modified.
How do I verify an archive is not corrupted?
Use tar tzf archive.tar.gz > /dev/null — this reads the entire archive and will error on corruption. For gzip files specifically, gzip -t file.gz runs an integrity check. Zstd has zstd -t file.zst.
Wrapping Up
The core pattern is simple: tar bundles, compressors shrink, and you almost always want both. For day-to-day work, tar --zstd -cf and tar xf cover 90% of what you need. Reach for zip when cross-platform sharing matters, and xz when you're distributing software and every kilobyte counts.
The commands that'll serve you best:
# Create (zstd, modern systems)
tar --zstd -cf archive.tar.zst directory/
# Create (gzip, universal compatibility)
tar czf archive.tar.gz directory/
# Extract (any format, auto-detected)
tar xf archive.tar.gz
# List contents before extracting
tar tf archive.tar.gz
# Cross-platform sharing
zip -r archive.zip directory/
For more CLI tools that pair well with tar, check the CLI tools map. The find guide covers file selection patterns, and the ssh and rsync guide has more on remote transfer workflows.