32blogby Studio Mitsu

sed & awk Practical Guide: The Classic Text Processing Duo

Master sed substitution, deletion, and insertion alongside awk column extraction, aggregation, and conditional processing with real examples.

by omitsu12 min read
On this page

sed handles line-oriented text transformation (substitution, deletion, insertion) while awk excels at column-oriented data extraction, calculation, and aggregation. Pipe them together and you can process everything from bulk config updates to CSV aggregation and log analysis in one-liners.

"I need to bulk-replace values across config files." "I want to extract specific columns from a CSV and calculate totals." "I need to strip certain lines from a log file."

These are classic text processing scenarios where sed and awk shine. Both are filter commands designed for pipelines, but they excel at distinctly different things.

This article walks you through sed and awk fundamentals, practical patterns, and how to combine them into powerful workflows.

sed and awk: What's the difference?

Here is the quick breakdown.

sedawk
Full nameStream EditorPattern scanning and processing language
StrengthLine-oriented text transformation (substitution, deletion, insertion)Column-oriented data processing (extraction, calculation, aggregation)
Mental model"Rewrite text""Extract data from text"
Typical useConfig file updates, log cleanupCSV/TSV aggregation, access log analysis

Both are pipeline filter commands connected with |. sed excels at transformation, awk excels at extraction and calculation. Keep this distinction in mind and you will always know which tool to reach for.

sed basics

sed — the "stream editor" — reads text line by line, applies transformations, and outputs the result.

Substitution

The most common operation is the s (substitute) command.

bash
# Replace the first match on each line
sed 's/old/new/' file.txt

# Replace all matches on each line (g flag)
sed 's/old/new/g' file.txt

# Edit the file in place (-i option)
sed -i 's/old/new/g' file.txt

Line and range selection

bash
# Substitute only on line 3
sed '3s/old/new/' file.txt

# Substitute on lines 2 through 5
sed '2,5s/old/new/g' file.txt

# Substitute only on the last line
sed '$s/old/new/' file.txt

Deletion, insertion, and printing

bash
# Delete lines matching a pattern
sed '/pattern/d' file.txt

# Insert a line after the match
sed '/pattern/a\new line' file.txt

# Insert a line before the match
sed '/pattern/i\new line' file.txt

# Print only a specific range of lines (-n suppresses default output)
sed -n '10,20p' file.txt

Multiple operations and custom delimiters

bash
# Run multiple substitutions in one pass
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Use an alternative delimiter (handy for paths and URLs)
sed 's|/usr/local|/opt|g' config.txt

The delimiter can be any character — |, #, @, and so on. This eliminates the need to escape / when working with file paths.

sed in practice

Bulk config file updates

A common scenario during server migrations or deployments.

bash
# Update DB_HOST in a .env file
NEW_HOST="db-prod.example.com"
sed -i "s/DB_HOST=.*/DB_HOST=${NEW_HOST}/" .env

# Comment out a line
sed -i 's/^PermitRootLogin yes/# PermitRootLogin yes/' /etc/ssh/sshd_config

# Uncomment a line
sed -i 's/^# *PermitRootLogin/PermitRootLogin/' /etc/ssh/sshd_config

Log file cleanup

bash
# Remove empty lines
sed '/^$/d' file.txt

# Strip leading and trailing whitespace
sed 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt

# Remove ANSI escape sequences (color codes)
sed 's/\x1b\[[0-9;]*m//g' colored-output.txt

Batch operations with find

bash
# Replace the year across all .txt files in the project
find . -name "*.txt" -exec sed -i 's/2025/2026/g' {} +

# Generate .env from .env.example while replacing values
sed 's/DB_PASSWORD=changeme/DB_PASSWORD=s3cur3P@ss/' .env.example > .env

awk basics

awk processes text at the field (column) level. Each line is automatically split on whitespace, and fields are accessible as $1, $2, and so on.

Note that gawk 5.4.0 switched the default regex engine to MinRX, which is fully POSIX-compliant. Patterns relying on GNU-specific regex extensions may behave differently. To use the legacy engine, set the environment variable GAWK_GNU_MATCHERS=1.

Column extraction

bash
# Print the 1st and 3rd columns
awk '{print $1, $3}' data.tsv

# Specify a delimiter (for CSV)
awk -F',' '{print $1, $2}' data.csv

# Print the entire line ($0 is the whole line)
awk '{print $0}' file.txt

Pattern matching and conditions

bash
# Print only lines containing "error"
awk '/error/ {print $0}' logfile.txt

# Print lines where the 3rd column exceeds 100
awk '$3 > 100 {print $1, $3}' data.txt

# Combine multiple conditions
awk '$3 > 100 && $2 == "active" {print $1, $3}' data.txt

Built-in variables

awk provides several useful built-in variables.

VariableMeaning
NRCurrent line number (Number of Records)
NFNumber of fields on the current line (Number of Fields)
FSInput field separator (Field Separator)
OFSOutput field separator
bash
# Print with line numbers
awk '{print NR": "$0}' file.txt

# Show the field count for each line
awk '{print NR": "NF" fields"}' data.txt

# Print the last field
awk '{print $NF}' data.txt

BEGIN / END blocks and aggregation

bash
# Add a header and footer
awk 'BEGIN {print "Name,Score"} {print $1","$3} END {print "---done---"}' data.txt

# Sum the 2nd column
awk '{sum += $2} END {print "Total:", sum}' sales.txt

# Calculate the average
awk '{sum += $2; count++} END {print "Average:", sum/count}' sales.txt

Formatted output with printf

bash
# Left-align 20 chars, right-align 10 chars with 2 decimal places
awk '{printf "%-20s %10.2f\n", $1, $3}' data.txt

printf uses the same format specifiers as C. It is invaluable when you need neatly aligned tabular output.

awk in practice

CSV/TSV data aggregation

bash
# Sales totals by category (using associative arrays)
# Input: category,product,amount CSV
awk -F',' '{
    sales[$1] += $3
}
END {
    for (cat in sales)
        printf "%-15s %10.0f\n", cat, sales[cat]
}' sales.csv
bash
# Calculate max, min, and average in one pass
awk 'BEGIN {max = -999999; min = 999999}
{
    sum += $2
    count++
    if ($2 > max) max = $2
    if ($2 < min) min = $2
}
END {
    printf "Max: %.2f  Min: %.2f  Avg: %.2f\n", max, min, sum/count
}' data.txt

Access log analysis

bash
# Top 10 IP addresses by request count
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' access.log | sort -rn | head -10
bash
# HTTP status code breakdown
# CLF format: IP - - [date] "method path proto" status size
awk '{print $9}' access.log | sort | uniq -c | sort -rn

Formatted output

bash
# Display user information in table format
awk -F':' 'BEGIN {
    printf "%-20s %-6s %-6s %s\n", "USER", "UID", "GID", "HOME"
    printf "%-20s %-6s %-6s %s\n", "----", "---", "---", "----"
}
$3 >= 1000 && $3 < 65534 {
    printf "%-20s %-6s %-6s %s\n", $1, $3, $4, $6
}' /etc/passwd

Combining sed and awk

sed and awk reach their full potential when piped together. The typical pattern is sed for preprocessing (formatting, filtering) followed by awk for aggregation.

bash
# Remove the header row from a CSV, then sum the 3rd column
sed '1d' sales.csv | awk -F',' '{sum += $3} END {print "Total:", sum}'
bash
# Extract ERROR lines from a log, then display only the timestamp and message
sed -n '/ERROR/p' app.log | awk '{print $1, $2, substr($0, index($0,$5))}'
bash
# Extract bash users from /etc/passwd and format the output
grep '/bash$' /etc/passwd | awk -F':' '{printf "%-15s UID=%-6s %s\n", $1, $3, $6}'
bash
# Count today's errors in syslog by service
sed -n "/$(date '+%b %e')/p" /var/log/syslog | awk '/error|fail/ {count[$5]++} END {for (s in count) print count[s], s}' | sort -rn

Modern alternative: sd

sd is a Rust-powered sed alternative. Compared to sed's s/old/new/g syntax, sd requires less escaping and reads more naturally.

Installation

bash
# In a WSL environment, same as Linux
cargo install sd

Basic usage

bash
# Replace via stdin
echo "hello world" | sd 'world' 'earth'

# Edit a file in place
sd 'old' 'new' file.txt

sed vs sd comparison

Operationsedsd
Basic replacementsed 's/foo/bar/g'sd 'foo' 'bar'
Path replacementsed 's|/usr/local|/opt|g'sd '/usr/local' '/opt'
Regex groupssed 's/\(foo\)/[\1]/g'sd '(foo)' '[$1]'
In-place editsed -i 's/foo/bar/g' filesd 'foo' 'bar' file

sd uses PCRE-style regex by default, so there is no need for \( \) escaping. Path replacements also work without changing the delimiter.

bash
# Multi-line matching (v1.1.0+)
sd --across 'start\n.*\nend' 'replaced' file.txt

If you are already comfortable with sed, there is no need to switch. However, for one-liners where regex escaping gets messy, sd significantly reduces the chance of errors. A practical rule of thumb: use sd for simple replacements, stick with sed for line addressing and complex scripts.

Advanced techniques

Grouping aggregation with awk associative arrays

Awk's associative arrays are powerful for data grouping. You can replicate SQL's GROUP BY in a one-liner.

bash
# Total response size by status code
awk '{status[$9]++; size[$9]+=$10} END {
    for (s in status)
        printf "%s: %d requests, %.2f MB\n", s, status[s], size[s]/1024/1024
}' access.log

Extracting specific blocks with sed range addresses

Use range addresses to extract content between two patterns.

bash
# Extract blocks between BEGIN and END
sed -n '/^BEGIN$/,/^END$/p' config.txt

# Extract a function definition (simple version, no brace matching)
sed -n '/^function setup/,/^}/p' script.sh

Field delimiter conversion with awk OFS

Different input and output delimiters come up often. Control this with OFS (Output Field Separator).

bash
# TSV → CSV conversion
awk 'BEGIN {FS="\t"; OFS=","} {$1=$1; print}' data.tsv > data.csv

# CSV → pipe-delimited
awk -F',' 'BEGIN {OFS="|"} {$1=$1; print}' data.csv

$1=$1 is a trick that forces awk to rebuild the record. Without it, OFS is not applied.

sed hold space

sed has two buffers: the "pattern space" (current line) and the "hold space" (temporary buffer). Manipulate them with h/H (save), g/G (restore), and x (exchange).

bash
# Print the line before each blank line (get section-ending lines)
sed -n '/^$/!{h;d}; /^$/{x;p}' file.txt

This is an advanced technique, but the hold space can be the only solution for complex text transformations.

FAQ

When should I use sed vs awk?

Use sed for line-oriented rewriting (substitution, deletion, insertion). Use awk for column extraction, calculation, and aggregation. If you're asking "do I want to transform text or extract data from it?" — the answer tells you which tool to use.

What's the difference between macOS sed and Linux sed?

macOS ships BSD sed, which handles -i differently. GNU sed uses sed -i 's/...' for in-place editing, but BSD sed requires sed -i '' 's/...' with an explicit empty backup extension. Install GNU sed on macOS with brew install gnu-sed (available as gsed).

What's the difference between awk and gawk?

awk is the POSIX-specified command. gawk is the GNU implementation with extensions like for...in for associative arrays, extended printf, and BEGINFILE/ENDFILE blocks. On Linux, awk is typically a symlink to gawk.

Can sed handle multi-byte characters (UTF-8)?

Yes, GNU sed processes multi-byte characters according to your locale settings. With LC_ALL=en_US.UTF-8 set, . (any character) matches a single character, not a single byte. This works correctly for CJK characters, accented Latin characters, and emoji.

Can awk handle very large files (multi-GB)?

Yes. awk processes input as a stream, line by line, without loading the entire file into memory. Multi-GB log files are no problem. However, if you accumulate many keys in associative arrays, memory usage grows — in those cases, consider sort | uniq -c as an alternative.

Should I use sd or sed?

For simple replacements (especially paths and URLs), sd requires less escaping and reads more cleanly. For line addressing, hold space manipulation, and complex scripting, sed is the only option. Install both and use whichever fits the task.

How do I preview sed changes without modifying the file?

Run sed 's/old/new/g' file.txt without -i and the result goes to stdout. Combine with diff: sed 's/old/new/g' file.txt | diff file.txt - to see exactly what would change.

Wrapping Up

sed and awk are the classic duo for text processing.

  • sed — Line-oriented substitution, deletion, and insertion. Perfect for config updates and log cleanup
  • awk — Column-oriented extraction, calculation, and aggregation. Ideal for CSV/TSV processing and access log analysis
  • sed + awk — Pipe them together for preprocessing followed by aggregation. sed shapes, awk computes
  • sd — A modern alternative for when sed syntax gets unwieldy. Less escaping, cleaner one-liners

Both tools have nearly 50 years of history and are installed on every Linux system. Start with sed's s command and awk's {print $1}, then build your pattern repertoire from there. For related tools, check the grep and ripgrep guide, jq guide, Rust CLI tools guide, and the CLI tools map.