Unix Philosophy: do one thing well

Assembly Line Analogy

Imagine a factory assembly line. Each worker at each station does exactly one job — cut the metal, drill the hole, paint the surface, attach the label. No worker tries to do everything. The piece travels along a conveyor belt from station to station. Unix pipelines work identically: each program is a specialist that reads text from a conveyor belt (stdin), does one transformation, and passes the result on. The pipe | is the conveyor belt.

The Unix philosophy, articulated by Doug McIlroy at Bell Labs in the 1970s, can be summarised in three rules:

Rule 1 — Do one thing well. ls lists files. grep searches for patterns. wc counts lines/words/bytes. sort sorts lines. uniq removes duplicates. None of them tries to do the other's job. Each program is small, fast, and easy to understand.

Rule 2 — Use text streams as the universal interface. Every program reads plain text lines from standard input and writes plain text lines to standard output. Because the interface is universal, any tool can be connected to any other tool — without either tool knowing anything about the other.

Rule 3 — Compose programs together. Complex tasks are solved by chaining simple programs, not by writing one giant program. cat access.log | grep "404" | sort | uniq -c | sort -rn | head -10 is a complete log analyser built from six single-purpose tools.

The Three Standard Streams

Every Unix process is born with three file descriptors already open. A file descriptor is just a small non-negative integer that the kernel uses to identify an open file or I/O channel. Standard streams are not special — they are just files with reserved numbers:

FD Number Name Direction Default destination C macro
0 stdin Input Keyboard (terminal) STDIN_FILENO
1 stdout Output Terminal screen STDOUT_FILENO
2 stderr Output (errors) Terminal screen STDERR_FILENO

Why two output streams? Separating normal output (stdout, FD 1) from error messages (stderr, FD 2) means you can redirect program output to a file without losing the error messages on screen — or vice versa. When you run ls -l 2>/dev/null, errors are silenced but normal output still goes to the terminal.

stdin as a file. From the program's point of view, reading from the keyboard is identical to reading from a file — it just reads bytes from FD 0. This is why you can redirect a file into a program's stdin with ./myprog < data.txt and the program does not need any changes.

Key insight

Pipes and redirection do not change programs. They change where FDs 0, 1, and 2 point before the program starts. The program just reads from FD 0 and writes to FD 1 as normal — it has no idea whether it's talking to a keyboard/screen or to a file/pipe.

The pipe | connects two processes. When you write cmd1 | cmd2, the shell creates an in-kernel buffer (the pipe), redirects cmd1's stdout (FD 1) to the pipe's write end, and redirects cmd2's stdin (FD 0) to the pipe's read end. Both processes run simultaneously. cmd1 produces data; the kernel buffers it; cmd2 consumes it. Data never touches the disk.

Mental model check

Pipes connect two processes in memory. Redirection connects a process to a file on disk. That is the only real difference. Both work by changing where FD 0, 1, or 2 points before the program runs.

Operators, redirections, and special forms

Pipe and Redirection Operators

# ── PIPE ─────────────────────────────────────────────────────────────────
cmd1 | cmd2
#     ^── connects cmd1's stdout (FD 1) to cmd2's stdin (FD 0)
#          both processes run concurrently in the same shell job

cmd1 | cmd2 | cmd3
#     chain as many stages as needed; each stage reads from the previous

# ── OUTPUT REDIRECTION ────────────────────────────────────────────────────
cmd >  file.txt       # stdout → file.txt  (OVERWRITE — creates or truncates)
cmd >> file.txt       # stdout → file.txt  (APPEND — creates or adds to end)

# ── INPUT REDIRECTION ─────────────────────────────────────────────────────
cmd <  file.txt       # stdin ← file.txt   (read input from file instead of keyboard)

# ── STDERR REDIRECTION ───────────────────────────────────────────────────
cmd 2>  err.txt       # stderr (FD 2) → err.txt
cmd 2>> err.txt       # stderr → err.txt   (append)
cmd 2>  /dev/null     # discard all errors — /dev/null is the "black hole" device

# ── COMBINING STDOUT AND STDERR ──────────────────────────────────────────
cmd > file.txt 2>&1  # stdout → file.txt, THEN redirect FD 2 to wherever FD 1 now points
#                      ORDER MATTERS: >file must come before 2>&1
cmd &> file.txt       # bash shorthand: both stdout and stderr → file.txt

# ── HERE-DOC ─────────────────────────────────────────────────────────────
cmd << MARKER
line1
line2
MARKER                 # feeds literal multi-line text to cmd's stdin

# ── COMMAND SUBSTITUTION ─────────────────────────────────────────────────
$(cmd)                 # captures cmd's stdout as a string value
result=$(date)        # stores today's date in variable 'result'
FD numbers: You can redirect any file descriptor, not just 0/1/2. The general form is N>file where N is the FD number. 2>&1 means "make FD 2 a duplicate of wherever FD 1 currently points".

The 2>&1 Order Trap

Order of redirections matters — this is a classic exam question

cmd > file 2>&1 — CORRECT: first redirect stdout to file, then redirect stderr to wherever stdout now points (the file). Both end up in file.

cmd 2>&1 > file — WRONG ORDER: first redirect stderr to wherever stdout currently points (the terminal), then redirect stdout to file. Stderr stays on terminal; only stdout goes to file.

tee — write to both file and pipe simultaneously

cmd | tee logfile.txt | next_cmd
#       ^── tee reads from stdin, writes to BOTH logfile.txt AND stdout
#            so next_cmd still receives the data, AND a copy is saved

tee -a logfile.txt    # -a flag appends instead of overwriting
Named after the T-pipe fitting: water flows in from one direction, exits in two. Use tee to log pipeline data mid-stream without breaking the pipeline.

xargs — build command arguments from stdin lines

find . -name "*.c" | xargs wc -l
#   ^ finds .c files, one per line  ^ xargs reads those lines and
#                                     runs: wc -l file1.c file2.c ...

echo "foo bar baz" | xargs -n1 echo
# -n1 = pass one argument at a time; prints each word on its own line
Why xargs? Many commands do not read from stdin — they only accept file arguments. xargs bridges that gap by converting stdin lines into command-line arguments.

Redirection Quick Reference Table

Operator What it does FD affected Creates file?
> file Redirect stdout to file (overwrite) 1 Yes (truncates existing)
>> file Redirect stdout to file (append) 1 Yes (creates if needed)
< file Redirect stdin from file 0 No (file must exist)
2> file Redirect stderr to file 2 Yes (truncates existing)
2>&1 Redirect stderr to wherever stdout points 2 duplicates 1 No (dup of existing fd)
&> file Redirect both stdout and stderr to file (bash) 1 + 2 Yes
<< MARKER Here-document: feed literal text to stdin 0 No (in-memory)
$(cmd) Command substitution: capture cmd's stdout captures 1 No
cmd1 | cmd2 Pipe: cmd1 stdout feeds cmd2 stdin 10 No (in-kernel buffer)

Pipeline composition and data flow

Example 1 — Classic pipeline: count unique extensions in a directory LO3 — pipeline composition
# Goal: list all file extensions in /usr/include and count how many files
# use each one, sorted from most to least common.

ls /usr/include | grep '\.' | sed 's/.*\.//' | sort | uniq -c | sort -rn | head -5

# Step-by-step data flow:
#
#  ls /usr/include
#    -> lists all filenames, one per line:
#       stdio.h
#       stdlib.h
#       string.h
#       ...
#
#  | grep '\.'
#    -> keeps only lines that contain a dot (filters out directories with no dot)
#
#  | sed 's/.*\.//'
#    -> strips everything up to and including the last dot, leaving just the extension:
#       h
#       h
#       ...
#
#  | sort
#    -> alphabetically sorts the extension names so identical ones are adjacent
#
#  | uniq -c
#    -> collapses consecutive identical lines, prepending a count:
#       158 h
#         2 hpp
#
#  | sort -rn
#    -> sorts numerically (-n) in reverse (-r) order — highest count first
#
#  | head -5
#    -> shows only the top 5 results
Typical output
158 h
4 hpp
2 tcc
1 conf
1 inc
Example 2 — Input and output redirection with stderr capture LO3 — redirection operators
# Scenario: sort a file and save the result, capturing any errors too.
# Input file: numbers.txt containing one number per line (possibly with bad lines).

sort < numbers.txt > sorted.txt 2> sort_errors.txt

# Data flow:
#   FD 0 (stdin)  <- numbers.txt   (keyboard is NOT used)
#   FD 1 (stdout)  -> sorted.txt   (terminal is NOT written)
#   FD 2 (stderr)  -> sort_errors.txt  (error messages go here)
#
# 'sort' just reads FD 0 and writes FD 1 as always — it has no idea
# those are files rather than a keyboard and screen.

# ── Variant: merge stdout and stderr into one file ──────────────────────
sort < numbers.txt > all_output.txt 2>&1
#                    ^-- first: FD 1 → all_output.txt
#                              ^-- then: FD 2 → wherever FD 1 points (the file)

# ── Variant: send errors to /dev/null (discard silently) ─────────────────
sort < numbers.txt > sorted.txt 2>/dev/null
# Any error messages (e.g. "sort: open failed") disappear completely.

# ── Variant: append sorted output to an existing file ────────────────────
sort < more_numbers.txt >> sorted.txt
# >> does NOT truncate sorted.txt; it adds to the end.
Result in sorted.txt (if numbers.txt contained 3, 1, 2)
1
2
3
Example 3 — tee, here-doc, and command substitution LO3 — advanced shell tools
# ── tee: log pipeline data without breaking the flow ──────────────────────
# Count lines in a large file AND save the raw file list to disk simultaneously:
find . -name "*.c" | tee found_files.txt | wc -l
# tee writes every found filename to found_files.txt
# AND passes those same lines downstream to wc -l
# Terminal shows the count; found_files.txt has the full list.

# ── Here-document: provide multi-line stdin without a file ─────────────────
cat << END_MSG
Dear student,
Welcome to COMP2017.
Good luck!
END_MSG
# Everything between << END_MSG and the closing END_MSG word is fed
# to cat's stdin. The marker word is arbitrary (conventionally: EOF, END, HEREDOC).

# Here-doc with a program that reads stdin:
wc -w << WORDS
the quick brown fox
jumped over the lazy dog
WORDS
# Output: 9  (nine words in total)

# ── Command substitution: capture output as a variable value ───────────────
today=$(date +%Y-%m-%d)
echo "Today is $today"
# Output: Today is 2026-06-10

# Capture the number of C files in this directory:
count=$(find . -name "*.c" | wc -l)
echo "Found $count C files"

# Nest command substitution inside another command:
echo "Kernel: $(uname -r), Host: $(hostname)"
Command substitution demo output
Today is 2026-06-10
Found 7 C files
Kernel: 5.15.0-91-generic, Host: lab-machine

Practice problems with solutions

P1 — Write a pipeline: find the 3 most common words in a file LO3 — pipeline composition

Write a shell pipeline that reads from essay.txt, converts all text to one word per line (use tr -s ' ' '\n' to split on spaces), converts to lowercase (use tr '[:upper:]' '[:lower:]'), then sorts, deduplicates with a count, sorts by count descending, and shows the top 3. Only use tr, sort, uniq, and head.

tr -s ' ' '\n' < essay.txt | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rn | head -3
Step by step: tr -s ' ' '\n' < essay.txt — read essay.txt via stdin redirect, squeeze multiple spaces into one and translate spaces to newlines, putting one word per line. | tr '[:upper:]' '[:lower:]' — convert all uppercase letters to lowercase so "The" and "the" are counted as the same word. | sort — sort alphabetically so identical words become adjacent. | uniq -c — collapse adjacent duplicates; the -c flag prepends a count to each unique line. | sort -rn — sort by the count field: -n numeric, -r descending (highest count first). | head -3 — show only the top 3 results.
P2 — Explain what this command does, step by step LO3 — reading pipelines

Without running it, describe exactly what each stage of this pipeline does and what the final output represents:

ps aux | grep -v "^USER" | awk '{print $1}' | sort | uniq -c | sort -rn | head -5
ps aux — list all running processes with all users (a), in user-oriented format (u), including processes without a terminal (x). One process per line; first column is the username.

| grep -v "^USER" — remove the header line. -v inverts the match: keep lines that do NOT start with "USER" (the column header).

| awk '{print $1}' — extract only the first field (column 1 = the username) from each line. awk splits on whitespace by default.

| sort — alphabetically sort usernames so identical names are adjacent.

| uniq -c — count consecutive duplicate lines; each unique username gets a count of how many processes it owns.

| sort -rn — sort numerically in descending order — the user with the most processes appears first.

| head -5 — show the top 5 results.

Final output: The five users running the most processes on this system, with their process counts.
P3 — Fix the broken redirections (three bugs) LO3 — redirection correctness

Each line below has a redirection mistake. Identify the bug and write the corrected version.

# Bug 1: wants to capture both stdout and stderr into combined.txt
./build.sh 2>&1 > combined.txt

# Bug 2: wants to append stdout to log.txt without overwriting it
./run.sh > log.txt

# Bug 3: wants to read input from data.txt AND save stdout to out.txt
./process > out.txt < out.txt
# Fix 1: stdout must be redirected BEFORE 2>&1
./build.sh > combined.txt 2>&1
# Explanation: the shell processes redirections left-to-right.
# Writing 2>&1 first duplicates FD2 to wherever FD1 currently points (the terminal).
# Then > combined.txt moves FD1 to the file, but FD2 is still on the terminal.
# The fix: redirect FD1 first, THEN duplicate FD2 from FD1.

# Fix 2: use >> to append instead of > which truncates
./run.sh >> log.txt
# > truncates log.txt to zero bytes before writing — all previous content is lost.
# >> opens the file in append mode, adding new output after existing content.

# Fix 3: input and output cannot be the same file with < and >
./process < data.txt > out.txt
# Using the same file for both > and < is dangerous: > truncates the file to zero
# bytes IMMEDIATELY when the shell opens it, before < reads it. The program reads
# an empty file. Use separate source and destination files.
Bug 1 summary: Order of > file and 2>&1 is reversed — always redirect stdout to file first, then duplicate stderr from stdout.
Bug 2 summary: > overwrites; >> appends.
Bug 3 summary: Never use the same file as both < source and > destination — the shell truncates on open, destroying the input before it can be read.
P4 — Combine tee, command substitution, and here-doc in a script LO3 — advanced shell features

Write a shell script that: (a) uses a here-doc to create a temporary file greeting.txt containing three lines of text; (b) uses command substitution to store the line count in a variable; (c) runs cat greeting.txt piped through tee backup.txt to display the content AND save a copy. Print the line count at the end.

#!/bin/bash

# (a) Create greeting.txt using a here-document fed into cat with redirection
cat << GREET > greeting.txt
Hello, world!
Welcome to COMP2017.
Unix philosophy in action.
GREET
# The here-doc is cat's stdin; > greeting.txt redirects cat's stdout to the file.

# (b) Command substitution: capture wc -l's output as a variable
line_count=$(wc -l < greeting.txt)
# wc -l reads from greeting.txt via stdin redirect, prints just the count to stdout
# $(...) captures that stdout value and assigns it to line_count

# (c) Display AND save a copy using tee
cat greeting.txt | tee backup.txt
# cat reads greeting.txt; tee writes to backup.txt AND passes data to stdout (screen)

echo "Line count: $line_count"
Key techniques used: The here-doc (<< GREET ... GREET) feeds multi-line text to cat's stdin without needing an existing file. The > greeting.txt captures cat's stdout into the file. Command substitution $(wc -l < greeting.txt) runs wc and captures its output as a string. tee backup.txt duplicates the data stream — the user sees it on screen AND it is saved to backup.txt.

Key concepts to memorize

Card 1 of 10
Question — click to flip
Answer
Click card to flip • Use buttons to navigate

Test your understanding

Topic 18 Quiz — Unix Philosophy & Pipes Score: 0 / 6
1
What is the file descriptor number for stderr?LO3
multiple choice
2
True or False: cmd > file 2>&1 and cmd 2>&1 > file produce the same result — both stdout and stderr end up in file.LO3
true / false
3
Which redirection operator appends stdout to a file without overwriting existing content?LO3
multiple choice
4
Fill in the blank: the shell operator ___ connects the stdout of one process to the stdin of the next process (type the single character).LO3
fill in the blank
5
Spot the bug: what is wrong with this command?LO3
./myprogram < data.txt > data.txt
spot the bug — multiple choice
6
What does tee logfile.txt do when used in a pipeline like cmd | tee logfile.txt | next?LO3
multiple choice
0/6
Quiz complete!