32blogby Studio Mitsu

wget Complete Guide: File Downloads, Batch Processing, and Automation

Learn wget from the basics to advanced usage: batch downloads, resuming interrupted transfers, site mirroring, scripting, and when to use curl instead.

by omitsu12 min read
On this page

wget is a CLI tool for downloading files over HTTP, HTTPS, and FTP. Run wget URL to download a file, use -c to resume interrupted transfers, -i for batch downloads from a URL list, and --mirror to clone entire websites for offline viewing.

Whether it's a single file, a batch of hundreds, or mirroring an entire website, wget handles it from the terminal. No GUI needed. Works over SSH. Can run in the background, resume interrupted downloads, and be scripted for automation.

This guide covers everything from the basics to real-world use cases, with ready-to-run examples throughout.

What is wget?

wget is a command-line utility for downloading files over HTTP, HTTPS, and FTP. It's part of the GNU Project and comes pre-installed on virtually every Linux distribution and available on macOS via Homebrew.

Key characteristics:

  • Non-interactive: runs completely unattended, perfect for scripts and cron jobs
  • Resumable: pick up where you left off after an interrupted download
  • Recursive: can crawl and download entire websites
  • Proxy-aware: works through HTTP proxies
  • Background-capable: detach from the terminal and download continues

Installation check

Verify wget is installed:

bash
wget --version

If it's missing:

bash
# Debian / Ubuntu
sudo apt install wget

# CentOS / RHEL
sudo yum install wget

# Fedora
sudo dnf install wget

# macOS (Homebrew)
brew install wget

Basic usage

Download a single file

bash
wget https://example.com/file.zip

The file saves to the current directory. A progress bar shows download speed, amount downloaded, and estimated time remaining.

Specify the output filename or directory

bash
# Save with a different name
wget -O myfile.zip https://example.com/file.zip

# Save to a specific directory
wget -P ~/downloads/ https://example.com/file.zip

# Custom directory and filename (use full path with -O)
wget -O ~/downloads/setup.zip https://example.com/file.zip

Run in the background

When downloading large files, detach from the terminal so you can keep working:

bash
wget -b https://example.com/largefile.iso

Output goes to wget-log. Monitor progress with:

bash
tail -f wget-log

Common options reference

OptionWhat it does
-O FILESave as FILE
-P DIRSave into directory DIR
-bBackground mode
-cContinue/resume interrupted download
-qQuiet mode (no output)
--limit-rate=RATELimit speed (e.g. --limit-rate=1m)
-rRecursive download
-l DEPTHSet recursion depth
--no-check-certificateSkip SSL verification
-i FILEDownload URLs listed in FILE
--user-agent=STRINGSet custom User-Agent
--header=STRINGAdd HTTP header
-NOnly download if newer than local copy
--tries=NNumber of retry attempts
--timeout=SECONDSSet connection timeout

Real-world use cases

Batch download from a URL list

Create a text file with one URL per line, then pass it to wget with -i:

bash
cat > urls.txt << EOF
https://releases.ubuntu.com/24.04.4/ubuntu-24.04.4-desktop-amd64.iso
https://example.com/data/january.csv
https://example.com/data/february.csv
https://example.com/data/march.csv
EOF

wget -i urls.txt -P ~/downloads/

Each URL is downloaded in sequence. Combine with -b to run the whole batch in the background.

Resume an interrupted download

If a large download gets cut off by a network hiccup, just add -c and re-run the same command:

bash
# Original download (interrupted)
wget https://example.com/bigfile.iso

# Resume from where it stopped
wget -c https://example.com/bigfile.iso

wget checks the local file size and requests only the remaining bytes. If no partial file exists, it starts fresh.

Throttle the download speed

Avoid saturating your connection or being rate-limited by the server:

bash
# Limit to 1 MB/s
wget --limit-rate=1m https://example.com/file.iso

# Limit to 500 KB/s
wget --limit-rate=500k https://example.com/file.iso

Authenticate with username and password

For HTTP Basic Auth:

bash
wget --user=myusername --password=mypassword https://example.com/protected/file.zip

Passwords in command arguments end up in shell history. If that's a concern, omit --password and wget will prompt for it interactively:

bash
wget --user=myusername https://example.com/protected/file.zip
# prompts: Password for 'myusername'@example.com:

For FTP:

bash
wget ftp://ftp.example.com/pub/file.tar.gz
wget --ftp-user=user --ftp-password=pass ftp://ftp.example.com/private/file.tar.gz

Mirror an entire website

Download a complete copy of a site for offline viewing or archiving:

bash
wget --mirror \
     --convert-links \
     --adjust-extension \
     --page-requisites \
     --no-parent \
     https://example.com/

What each flag does:

  • --mirror (or -m): recursive download + timestamps + infinite depth
  • --convert-links: rewrite links to work offline (absolute → relative)
  • --adjust-extension: add .html to pages that need it
  • --page-requisites: fetch CSS, images, JS — everything needed to render the page
  • --no-parent: don't go above the specified path

The downloaded site will be in a directory named after the domain.

Only download if the file has changed

Poll a URL and only download when the server's version is newer than your local copy:

bash
wget -N https://example.com/data.csv

This is great for keeping local data files in sync. Pair it with cron for scheduled updates:

bash
# crontab -e: run daily at 3 AM
0 3 * * * wget -N -q -P /var/data/ https://example.com/data.csv

Set a custom User-Agent

Some servers block requests from wget. Impersonate a browser:

bash
wget --user-agent="Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0" \
     https://example.com/file.zip

Add custom HTTP headers

Useful for APIs that require an Authorization token:

bash
wget --header="Authorization: Bearer YOUR_API_TOKEN" \
     --header="Accept: application/json" \
     https://api.example.com/export/data.json

wget vs curl: which one to use?

Both wget and curl download content from URLs, but they have different strengths:

Use casewgetcurl
Simple file downloadGreatFine
Recursive / site mirroringGreatNot supported
Resume interrupted downloadsGreatGreat
API requestsLimitedGreat
Handling response dataLimitedGreat
Background downloadBuilt-in (-b)Needs & disown
Batch from URL listBuilt-in (-i)Needs a script
HTTP method controlLimitedFull control

Rule of thumb:

  • Downloading files → wget
  • Calling APIs, inspecting responses, complex HTTP → curl
  • Scripting with flexible output handling → curl

Advanced techniques

Verify URLs without actually downloading anything. Useful for checking broken links on websites.

bash
# Check a single URL
wget --spider https://example.com/page.html
bash
# Batch check from a URL list
wget --spider -i urls.txt 2>&1 | grep -E "broken|200|404"
bash
# Recursively check an entire site (no downloads)
wget --spider --recursive --no-directories --level=2 \
     https://example.com/ 2>&1 | grep -B1 "broken"

--spider sends HEAD requests only, using almost no bandwidth. Useful in CI/CD pipelines for automated broken link detection on documentation sites.

Persist settings with wgetrc

Stop typing the same options every time by saving defaults in ~/.wgetrc.

text
# ~/.wgetrc
# Preserve timestamps
timestamping = on

# Retry count
tries = 3

# Connection timeout (seconds)
timeout = 30

# Default bandwidth limit
limit_rate = 2m

# Log file
logfile = /tmp/wget.log

# Respect robots.txt
robots = on
bash
# wgetrc settings apply automatically
wget https://example.com/file.zip

# Temporarily override settings
wget --no-config https://example.com/file.zip
wget --limit-rate=0 https://example.com/file.zip

Use --no-config to ignore wgetrc entirely. For team-wide settings, use /etc/wgetrc.

Log rejected URLs with --rejected-log

When downloading many files, track what failed for later review.

bash
# Log rejected URLs
wget --recursive --level=1 \
     --rejected-log=rejected.log \
     https://example.com/downloads/

# Retry only the failed URLs
cat rejected.log | awk '{print $1}' | wget -i -

Download large files in the background

bash
# -b runs in background (log goes to wget-log)
wget -b https://example.com/large-file.iso

# Monitor progress
tail -f wget-log

# Specify a custom log file
wget -b -o download.log https://example.com/large-file.iso

If you need downloads to survive SSH disconnections, combine with tmux or use nohup.

bash
nohup wget https://example.com/large-file.iso &

Scripting examples

wget pairs naturally with shell scripts. Here are some practical patterns.

Download multiple versions of a file

bash
#!/bin/bash

BASE_URL="https://example.com/releases"
VERSIONS=("1.0.0" "1.1.0" "1.2.0" "2.0.0")
DEST_DIR="./downloads"

mkdir -p "$DEST_DIR"

for VERSION in "${VERSIONS[@]}"; do
  FILE="myapp-${VERSION}.tar.gz"
  URL="${BASE_URL}/${VERSION}/${FILE}"

  echo "Downloading ${FILE}..."
  wget -q --show-progress -P "$DEST_DIR" "$URL"

  if [ $? -eq 0 ]; then
    echo "  OK: ${FILE}"
  else
    echo "  FAILED: ${FILE}"
  fi
done

echo "Done."

Download and verify checksum

bash
#!/bin/bash

# Check the official Ubuntu release page for the current version and SHA256 hash:
# https://releases.ubuntu.com/
URL="https://releases.ubuntu.com/24.04.4/ubuntu-24.04.4-desktop-amd64.iso"
EXPECTED_SHA256="<get the current hash from the official Ubuntu site>"

echo "Downloading..."
wget -q --show-progress -O ubuntu.iso "$URL"

echo "Verifying checksum..."
# Linux: sha256sum / macOS: shasum -a 256
if command -v sha256sum &>/dev/null; then
  ACTUAL_SHA256=$(sha256sum ubuntu.iso | awk '{print $1}')
else
  ACTUAL_SHA256=$(shasum -a 256 ubuntu.iso | awk '{print $1}')
fi

if [ "$ACTUAL_SHA256" = "$EXPECTED_SHA256" ]; then
  echo "Checksum OK — file is valid"
else
  echo "Checksum mismatch! File may be corrupted."
  exit 1
fi

Troubleshooting

SSL certificate errors

bash
# Skip verification (fine for testing, avoid in production)
wget --no-check-certificate https://example.com/file.zip

Connection timeouts or flaky servers

bash
# 30 second timeout, retry up to 5 times with a 10 second wait between retries
wget --timeout=30 --tries=5 --waitretry=10 https://example.com/file.zip

# Retry indefinitely (useful for very large downloads on unreliable connections)
wget --tries=0 https://example.com/file.zip

Check redirect chain without downloading

Inspect headers and see where a URL redirects to, without actually downloading anything:

bash
wget --server-response --spider https://example.com/file.zip

Download stalls at 0 bytes

Sometimes servers send a response but no data. Try adding a User-Agent or check if the URL requires authentication.

Security Considerations

When downloading files from external sources, security matters.

wget2 vulnerability (CVE-2025-69194): A high-severity vulnerability (CVSS 8.8) was found in GNU Wget2's Metalink document processing. When using --force-metalink, a path traversal flaw allows remote attackers to overwrite local files. If you use wget2, update to 2.2.1 or later.

Best practices:

  • Never use --no-check-certificate in production — limit it to testing with self-signed certs
  • Always verify checksums for files downloaded from untrusted sources (see the script example above)
  • User-Agent spoofing with --user-agent may violate a site's terms of service — check before scraping
  • When using cron for automated downloads, keep logs and monitor them periodically

FAQ

When should I use wget instead of curl?

Use wget for straightforward file downloads, batch processing with URL lists (-i), recursive site mirroring, and background downloads. Use curl when you need fine-grained control over HTTP methods, response handling, or API interactions.

How do I resume an interrupted wget download?

Run wget -c URL with the same URL. wget checks the local file size and requests only the remaining bytes from the server. This requires the server to support HTTP Range requests.

How do I mirror an entire website with wget?

Use wget --mirror --convert-links --adjust-extension --page-requisites --no-parent URL. The --mirror flag enables recursive downloading with infinite depth and timestamp checking, while --convert-links rewrites URLs for offline browsing.

What's the difference between wget2 and wget 1.x?

wget2 is the next-generation rewrite of GNU Wget. It adds HTTP/2 support, parallel downloads, multi-threading, and improved performance. However, it's not fully compatible with wget 1.x scripts, so test before migrating.

How do I use wget through a proxy?

Set the http_proxy and https_proxy environment variables, or add http_proxy / https_proxy directives to ~/.wgetrc. Use --no-proxy to temporarily bypass proxy settings.

How do I limit wget download speed?

Use --limit-rate=1m (1 MB/s) or --limit-rate=500k (500 KB/s). To set a default limit, add limit_rate = 2m to your ~/.wgetrc file.

What's the difference between wget -O and -P?

-O filename specifies the exact output filename. -P directory specifies the destination directory, and the filename is derived from the URL automatically.

Wrapping Up

wget is one of those tools you reach for daily once you're comfortable with it. The basics take five minutes to learn, and the advanced features are there when you need them. For the full option reference, check the official GNU Wget manual.

Quick summary of the most useful options:

  • Basic download: wget URL
  • Rename output: wget -O filename URL
  • Save to directory: wget -P /path/ URL
  • Background: wget -b URL
  • Resume: wget -c URL
  • Batch from list: wget -i urls.txt
  • Mirror a site: wget --mirror --convert-links --page-requisites URL
  • Conditional update: wget -N URL

Start with the basics, then reach for the advanced flags as specific needs come up. And when you find yourself repeatedly downloading files in a workflow, consider wrapping wget in a short shell script — it compounds nicely.