Unix Philosophy & Pipes / Redirection
How the Unix design philosophy turns small, single-purpose tools into powerful pipelines — stdin, stdout, stderr as file descriptors, | pipes, > >> < 2> redirection, here-doc, and command substitution.
Unix Philosophy: do one thing well
Imagine a factory assembly line. Each worker at each station does exactly one job — cut the metal, drill the hole, paint the surface, attach the label. No worker tries to do everything. The piece travels along a conveyor belt from station to station. Unix pipelines work identically: each program is a specialist that reads text from a conveyor belt (stdin), does one transformation, and passes the result on. The pipe | is the conveyor belt.
The Unix philosophy, articulated by Doug McIlroy at Bell Labs in the 1970s, can be summarised in three rules:
Rule 1 — Do one thing well. ls lists files. grep searches for patterns. wc counts lines/words/bytes. sort sorts lines. uniq removes duplicates. None of them tries to do the other's job. Each program is small, fast, and easy to understand.
Rule 2 — Use text streams as the universal interface. Every program reads plain text lines from standard input and writes plain text lines to standard output. Because the interface is universal, any tool can be connected to any other tool — without either tool knowing anything about the other.
Rule 3 — Compose programs together. Complex tasks are solved by chaining simple programs, not by writing one giant program. cat access.log | grep "404" | sort | uniq -c | sort -rn | head -10 is a complete log analyser built from six single-purpose tools.
The Three Standard Streams
Every Unix process is born with three file descriptors already open. A file descriptor is just a small non-negative integer that the kernel uses to identify an open file or I/O channel. Standard streams are not special — they are just files with reserved numbers:
| FD Number | Name | Direction | Default destination | C macro |
|---|---|---|---|---|
| 0 | stdin |
Input | Keyboard (terminal) | STDIN_FILENO |
| 1 | stdout |
Output | Terminal screen | STDOUT_FILENO |
| 2 | stderr |
Output (errors) | Terminal screen | STDERR_FILENO |
Why two output streams? Separating normal output (stdout, FD 1) from error messages (stderr, FD 2) means you can redirect program output to a file without losing the error messages on screen — or vice versa. When you run ls -l 2>/dev/null, errors are silenced but normal output still goes to the terminal.
stdin as a file. From the program's point of view, reading from the keyboard is identical to reading from a file — it just reads bytes from FD 0. This is why you can redirect a file into a program's stdin with ./myprog < data.txt and the program does not need any changes.
Pipes and redirection do not change programs. They change where FDs 0, 1, and 2 point before the program starts. The program just reads from FD 0 and writes to FD 1 as normal — it has no idea whether it's talking to a keyboard/screen or to a file/pipe.
The pipe | connects two processes. When you write cmd1 | cmd2, the shell creates an in-kernel buffer (the pipe), redirects cmd1's stdout (FD 1) to the pipe's write end, and redirects cmd2's stdin (FD 0) to the pipe's read end. Both processes run simultaneously. cmd1 produces data; the kernel buffers it; cmd2 consumes it. Data never touches the disk.
Pipes connect two processes in memory. Redirection connects a process to a file on disk. That is the only real difference. Both work by changing where FD 0, 1, or 2 points before the program runs.
Operators, redirections, and special forms
Pipe and Redirection Operators
# ── PIPE ───────────────────────────────────────────────────────────────── cmd1 | cmd2 # ^── connects cmd1's stdout (FD 1) to cmd2's stdin (FD 0) # both processes run concurrently in the same shell job cmd1 | cmd2 | cmd3 # chain as many stages as needed; each stage reads from the previous # ── OUTPUT REDIRECTION ──────────────────────────────────────────────────── cmd > file.txt # stdout → file.txt (OVERWRITE — creates or truncates) cmd >> file.txt # stdout → file.txt (APPEND — creates or adds to end) # ── INPUT REDIRECTION ───────────────────────────────────────────────────── cmd < file.txt # stdin ← file.txt (read input from file instead of keyboard) # ── STDERR REDIRECTION ─────────────────────────────────────────────────── cmd 2> err.txt # stderr (FD 2) → err.txt cmd 2>> err.txt # stderr → err.txt (append) cmd 2> /dev/null # discard all errors — /dev/null is the "black hole" device # ── COMBINING STDOUT AND STDERR ────────────────────────────────────────── cmd > file.txt 2>&1 # stdout → file.txt, THEN redirect FD 2 to wherever FD 1 now points # ORDER MATTERS: >file must come before 2>&1 cmd &> file.txt # bash shorthand: both stdout and stderr → file.txt # ── HERE-DOC ───────────────────────────────────────────────────────────── cmd << MARKER line1 line2 MARKER # feeds literal multi-line text to cmd's stdin # ── COMMAND SUBSTITUTION ───────────────────────────────────────────────── $(cmd) # captures cmd's stdout as a string value result=$(date) # stores today's date in variable 'result'
N>file where N is the FD number. 2>&1 means "make FD 2 a duplicate of wherever FD 1 currently points".
The 2>&1 Order Trap
cmd > file 2>&1 — CORRECT: first redirect stdout to file, then redirect stderr to wherever stdout now points (the file). Both end up in file.
cmd 2>&1 > file — WRONG ORDER: first redirect stderr to wherever stdout currently points (the terminal), then redirect stdout to file. Stderr stays on terminal; only stdout goes to file.
tee — write to both file and pipe simultaneously
cmd | tee logfile.txt | next_cmd # ^── tee reads from stdin, writes to BOTH logfile.txt AND stdout # so next_cmd still receives the data, AND a copy is saved tee -a logfile.txt # -a flag appends instead of overwriting
tee to log pipeline data mid-stream without breaking the pipeline.
xargs — build command arguments from stdin lines
find . -name "*.c" | xargs wc -l # ^ finds .c files, one per line ^ xargs reads those lines and # runs: wc -l file1.c file2.c ... echo "foo bar baz" | xargs -n1 echo # -n1 = pass one argument at a time; prints each word on its own line
xargs bridges that gap by converting stdin lines into command-line arguments.
Redirection Quick Reference Table
| Operator | What it does | FD affected | Creates file? |
|---|---|---|---|
> file |
Redirect stdout to file (overwrite) | 1 | Yes (truncates existing) |
>> file |
Redirect stdout to file (append) | 1 | Yes (creates if needed) |
< file |
Redirect stdin from file | 0 | No (file must exist) |
2> file |
Redirect stderr to file | 2 | Yes (truncates existing) |
2>&1 |
Redirect stderr to wherever stdout points | 2 duplicates 1 | No (dup of existing fd) |
&> file |
Redirect both stdout and stderr to file (bash) | 1 + 2 | Yes |
<< MARKER |
Here-document: feed literal text to stdin | 0 | No (in-memory) |
$(cmd) |
Command substitution: capture cmd's stdout | captures 1 | No |
cmd1 | cmd2 |
Pipe: cmd1 stdout feeds cmd2 stdin | 1 → 0 | No (in-kernel buffer) |
Pipeline composition and data flow
# Goal: list all file extensions in /usr/include and count how many files
# use each one, sorted from most to least common.
ls /usr/include | grep '\.' | sed 's/.*\.//' | sort | uniq -c | sort -rn | head -5
# Step-by-step data flow:
#
# ls /usr/include
# -> lists all filenames, one per line:
# stdio.h
# stdlib.h
# string.h
# ...
#
# | grep '\.'
# -> keeps only lines that contain a dot (filters out directories with no dot)
#
# | sed 's/.*\.//'
# -> strips everything up to and including the last dot, leaving just the extension:
# h
# h
# ...
#
# | sort
# -> alphabetically sorts the extension names so identical ones are adjacent
#
# | uniq -c
# -> collapses consecutive identical lines, prepending a count:
# 158 h
# 2 hpp
#
# | sort -rn
# -> sorts numerically (-n) in reverse (-r) order — highest count first
#
# | head -5
# -> shows only the top 5 results
4 hpp
2 tcc
1 conf
1 inc
# Scenario: sort a file and save the result, capturing any errors too.
# Input file: numbers.txt containing one number per line (possibly with bad lines).
sort < numbers.txt > sorted.txt 2> sort_errors.txt
# Data flow:
# FD 0 (stdin) <- numbers.txt (keyboard is NOT used)
# FD 1 (stdout) -> sorted.txt (terminal is NOT written)
# FD 2 (stderr) -> sort_errors.txt (error messages go here)
#
# 'sort' just reads FD 0 and writes FD 1 as always — it has no idea
# those are files rather than a keyboard and screen.
# ── Variant: merge stdout and stderr into one file ──────────────────────
sort < numbers.txt > all_output.txt 2>&1
# ^-- first: FD 1 → all_output.txt
# ^-- then: FD 2 → wherever FD 1 points (the file)
# ── Variant: send errors to /dev/null (discard silently) ─────────────────
sort < numbers.txt > sorted.txt 2>/dev/null
# Any error messages (e.g. "sort: open failed") disappear completely.
# ── Variant: append sorted output to an existing file ────────────────────
sort < more_numbers.txt >> sorted.txt
# >> does NOT truncate sorted.txt; it adds to the end.
2
3
# ── tee: log pipeline data without breaking the flow ──────────────────────
# Count lines in a large file AND save the raw file list to disk simultaneously:
find . -name "*.c" | tee found_files.txt | wc -l
# tee writes every found filename to found_files.txt
# AND passes those same lines downstream to wc -l
# Terminal shows the count; found_files.txt has the full list.
# ── Here-document: provide multi-line stdin without a file ─────────────────
cat << END_MSG
Dear student,
Welcome to COMP2017.
Good luck!
END_MSG
# Everything between << END_MSG and the closing END_MSG word is fed
# to cat's stdin. The marker word is arbitrary (conventionally: EOF, END, HEREDOC).
# Here-doc with a program that reads stdin:
wc -w << WORDS
the quick brown fox
jumped over the lazy dog
WORDS
# Output: 9 (nine words in total)
# ── Command substitution: capture output as a variable value ───────────────
today=$(date +%Y-%m-%d)
echo "Today is $today"
# Output: Today is 2026-06-10
# Capture the number of C files in this directory:
count=$(find . -name "*.c" | wc -l)
echo "Found $count C files"
# Nest command substitution inside another command:
echo "Kernel: $(uname -r), Host: $(hostname)"
Found 7 C files
Kernel: 5.15.0-91-generic, Host: lab-machine
Practice problems with solutions
Write a shell pipeline that reads from essay.txt, converts all text to one word per line (use tr -s ' ' '\n' to split on spaces), converts to lowercase (use tr '[:upper:]' '[:lower:]'), then sorts, deduplicates with a count, sorts by count descending, and shows the top 3. Only use tr, sort, uniq, and head.
tr -s ' ' '\n' < essay.txt | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn | head -3
-c flag prepends a count to each unique line.
| sort -rn — sort by the count field: -n numeric, -r descending (highest count first).
| head -3 — show only the top 3 results.
Without running it, describe exactly what each stage of this pipeline does and what the final output represents:
ps aux | grep -v "^USER" | awk '{print $1}' | sort | uniq -c | sort -rn | head -5
a), in user-oriented format (u), including processes without a terminal (x). One process per line; first column is the username.| grep -v "^USER" — remove the header line.
-v inverts the match: keep lines that do NOT start with "USER" (the column header).| awk '{print $1}' — extract only the first field (column 1 = the username) from each line. awk splits on whitespace by default.
| sort — alphabetically sort usernames so identical names are adjacent.
| uniq -c — count consecutive duplicate lines; each unique username gets a count of how many processes it owns.
| sort -rn — sort numerically in descending order — the user with the most processes appears first.
| head -5 — show the top 5 results.
Final output: The five users running the most processes on this system, with their process counts.
Each line below has a redirection mistake. Identify the bug and write the corrected version.
# Bug 1: wants to capture both stdout and stderr into combined.txt ./build.sh 2>&1 > combined.txt # Bug 2: wants to append stdout to log.txt without overwriting it ./run.sh > log.txt # Bug 3: wants to read input from data.txt AND save stdout to out.txt ./process > out.txt < out.txt
# Fix 1: stdout must be redirected BEFORE 2>&1
./build.sh > combined.txt 2>&1
# Explanation: the shell processes redirections left-to-right.
# Writing 2>&1 first duplicates FD2 to wherever FD1 currently points (the terminal).
# Then > combined.txt moves FD1 to the file, but FD2 is still on the terminal.
# The fix: redirect FD1 first, THEN duplicate FD2 from FD1.
# Fix 2: use >> to append instead of > which truncates
./run.sh >> log.txt
# > truncates log.txt to zero bytes before writing — all previous content is lost.
# >> opens the file in append mode, adding new output after existing content.
# Fix 3: input and output cannot be the same file with < and >
./process < data.txt > out.txt
# Using the same file for both > and < is dangerous: > truncates the file to zero
# bytes IMMEDIATELY when the shell opens it, before < reads it. The program reads
# an empty file. Use separate source and destination files.
> file and 2>&1 is reversed — always redirect stdout to file first, then duplicate stderr from stdout.Bug 2 summary:
> overwrites; >> appends.Bug 3 summary: Never use the same file as both
< source and > destination — the shell truncates on open, destroying the input before it can be read.
Write a shell script that: (a) uses a here-doc to create a temporary file greeting.txt containing three lines of text; (b) uses command substitution to store the line count in a variable; (c) runs cat greeting.txt piped through tee backup.txt to display the content AND save a copy. Print the line count at the end.
#!/bin/bash
# (a) Create greeting.txt using a here-document fed into cat with redirection
cat << GREET > greeting.txt
Hello, world!
Welcome to COMP2017.
Unix philosophy in action.
GREET
# The here-doc is cat's stdin; > greeting.txt redirects cat's stdout to the file.
# (b) Command substitution: capture wc -l's output as a variable
line_count=$(wc -l < greeting.txt)
# wc -l reads from greeting.txt via stdin redirect, prints just the count to stdout
# $(...) captures that stdout value and assigns it to line_count
# (c) Display AND save a copy using tee
cat greeting.txt | tee backup.txt
# cat reads greeting.txt; tee writes to backup.txt AND passes data to stdout (screen)
echo "Line count: $line_count"
<< GREET ... GREET) feeds multi-line text to cat's stdin without needing an existing file.
The > greeting.txt captures cat's stdout into the file.
Command substitution $(wc -l < greeting.txt) runs wc and captures its output as a string.
tee backup.txt duplicates the data stream — the user sees it on screen AND it is saved to backup.txt.
Key concepts to memorize
Test your understanding
cmd > file 2>&1 and cmd 2>&1 > file produce the same result — both stdout and stderr end up in file.LO3___ connects the stdout of one process to the stdin of the next process (type the single character).LO3./myprogram < data.txt > data.txt
tee logfile.txt do when used in a pipeline like cmd | tee logfile.txt | next?LO3