From source code to executable: the full pipeline

Real-World Analogy

Think of building a house. You have architects who draw blueprints (source files), contractors who build individual rooms from those blueprints (object files), and a general contractor who connects all the rooms into a coherent house (linker). Some rooms come pre-built from a catalogue — you just slot them in. That catalogue is a library. Static libraries are like buying a prefabricated room that gets permanently embedded in your house. Dynamic libraries are like hiring a plumber who shows up only when you need the bathroom — the code is loaded on demand at runtime.

When you run gcc hello.c -o hello, four separate programs run behind the scenes. You rarely invoke them individually, but understanding each step is essential for debugging build errors, working with multiple files, and creating your own libraries.

Step 1
Preprocessor
.c + .h → .i
Step 2
Compiler
.i → .s
Step 3
Assembler
.s → .o
Step 4
Linker
.o + libs → executable

Step 1 — Preprocessor: Text substitution only. Expands #include (pastes header files in), #define macros, and #ifdef conditionals. Produces a pure C file with no preprocessor directives left. Run gcc -E hello.c to see this output.

Step 2 — Compiler: Translates preprocessed C into assembly language (human-readable CPU instructions). Run gcc -S hello.c to produce hello.s.

Step 3 — Assembler: Converts assembly language into binary machine code, producing an object file (.o). Object files contain the compiled code but with placeholder references for symbols defined in other files (e.g., a call to printf is just a placeholder until the linker fills it in). Run gcc -c hello.c to stop at this stage.

Step 4 — Linker: Combines all .o files and resolves all symbol references. It looks up each unresolved name (like printf) in the listed libraries and fills in the correct addresses. The output is a complete executable.

Why compile to .o files separately?

In a large project with dozens of .c files, recompiling everything every time you change one file would be slow. By compiling each file to its own .o, you only need to recompile the changed file and then re-link. This is exactly what Makefiles automate.

Multi-file projects and modules: In C, a module is a .c file paired with a .h header file. The .c file contains the definitions (the actual code). The .h file contains the declarations (the function signatures and types). Other files #include the .h to know what they can call — but the actual code only lives once, in the .c file.

extern declarations tell the compiler "this symbol exists somewhere else — trust me and let the linker find it." Without extern, each .c file would think it needs to define everything itself, leading to duplicate definition errors.

Header guards prevent a header file from being included more than once in the same translation unit, which would cause duplicate type definitions. The standard pattern is #ifndef MYHEADER_H / #define MYHEADER_H / ... / #endif. Every header file you write should have one.

Static vs Dynamic libraries — the key difference: A static library (.a) is an archive of .o files. When you link against it, the linker copies the needed object code directly into your executable. Your binary is self-contained but larger. A dynamic (shared) library (.so on Linux, .dylib on macOS) is loaded at runtime when the program starts. Multiple processes can share one copy in memory. Your binary is smaller, but the .so file must be present at runtime.

Undefined symbol errors — how to diagnose them

The most common linker error is "undefined reference to 'foo'". This means the linker found a call to foo but couldn't find its definition in any of the .o or library files you provided. Fix: either add the missing .c file to the compile command, or add the correct -l flag for the library that provides it.

Python (what you know)
# Python imports happen at runtime
import math          # standard library
import mymodule      # your own file

# No separate compile step
# No header files needed
# No linking step
# Just run: python3 main.py
C (what you are learning)
/* C includes happen at preprocessor time */
#include <math.h>     /* standard library header */
#include "mymodule.h" /* your own header */

/* Must compile AND link:
   gcc main.c mymodule.c -lm -o prog
   -lm links the math library (libm.a/.so) */
Mental model locked in!

Source files compile to object files. Object files get linked together. Header files are declarations, not definitions. Use -c to compile only, -L for library path, -l for library name, -I for include path. Static = baked in. Dynamic = loaded at runtime.

Commands, flags, and file formats annotated

Compiling and linking commands

# Compile a single file to object code (stop before linking)
gcc -c util.c
#    ^   ^
#    |   └── source file
#    └────── -c flag: compile only, produce util.o

# Compile multiple .c files and link them
gcc main.c util.c -o myprog
#   ^     ^         ^
#   |     |         └── output executable named 'myprog'
#   |     └──────────── second source file
#   └────────────────── first source file

# Link existing object files
gcc main.o util.o -o myprog

# Compile with custom include and library paths
gcc -I./include -L./lib -lmylib main.c -o myprog
#   ^            ^        ^
#   |            |        └── link against libmylib.a or libmylib.so
#   |            └─────────── search ./lib for library files
#   └──────────────────────── search ./include for header files
Rule: -l strips the lib prefix and the .a/.so suffix. So -lm links libm.so, -lpthread links libpthread.so, -lmylib links libmylib.a or libmylib.so.

Creating static libraries with ar

# Step 1: compile source files to object code
gcc -c util.c    # produces util.o
gcc -c mylib.c  # produces mylib.o

# Step 2: pack them into an archive (static library)
ar rcs libmylib.a util.o mylib.o
# ^  ^   ^          ^
# |  |   |          └── object files to include
# |  |   └──────────── output archive name (must start with 'lib', end with '.a')
# |  └──────────────── flags: r=insert/replace, c=create, s=write index
# └─────────────────── the 'ar' archiver tool

# Step 3: link the static library into a program
gcc -o myprog main.c -L. -lmylib
#                      ^   ^
#                      |   └── link libmylib.a (or .so)
#                      └────── look for libraries in current dir (.)

Creating dynamic (shared) libraries

# Step 1: compile with Position Independent Code flag
gcc -c -fpic util.c   # -fpic = position-independent code, required for shared libs
gcc -c -fpic mylib.c

# Step 2: create shared object file
gcc -shared -o libmylib.so util.o mylib.o
#   ^           ^
#   |           └── output shared object (convention: lib*.so)
#   └─────────────── -shared: produce a shared library, not an executable

# Step 3: link against it (same -L -l flags as static)
gcc -o myprog main.c -L. -lmylib

# At runtime: tell the loader where to find the .so
LD_LIBRARY_PATH=. ./myprog
# Or install .so to /usr/local/lib and run ldconfig

Header guards — protecting against double inclusion

/* mylib.h — every header file should look like this */
#ifndef MYLIB_H        /* if MYLIB_H is not yet defined... */
#define MYLIB_H        /* ...define it now (marks this file as "seen") */

/* All your declarations go here */
int  add(int a, int b);
void print_result(int n);

#endif  /* MYLIB_H */

/* Without header guards, if two .c files both #include "mylib.h",
   the compiler sees the declarations twice → "redefinition" errors.
   The guard ensures the body is only processed once. */
Convention: use the filename in uppercase with dots replaced by underscores — mylib.hMYLIB_H. Many compilers also support #pragma once as a simpler alternative, but #ifndef guards are portable and standard.

Diagnostic tools: nm and ldd

CommandWhat it doesExample
nm myprog List all symbols (functions, global variables) in an object file or executable. Shows which are defined (T = text/code), undefined (U), or data (D). nm util.o | grep ' U ' — show unresolved symbols
ldd myprog List dynamic libraries (shared objects) that a program depends on at runtime, and their resolved paths. ldd ./myprog — shows libpthread.so, libc.so, etc.
ar t libmylib.a List the object files inside a static library archive. ar t libm.a
objdump -d myprog Disassemble an object file or executable — show the machine code as assembly. objdump -d util.o

Complete multi-file project examples

Example 1 — Three-file project: mathlib.h, mathlib.c, main.c Week 6 Lecture
/* ===== mathlib.h ===== */
/* Header guard — prevents double inclusion */
#ifndef MATHLIB_H
#define MATHLIB_H

/* Declaration only — no function body here */
/* extern is implicit for function declarations */
int add(int a, int b);
int multiply(int a, int b);
double power(double base, int exp);

#endif /* MATHLIB_H */


/* ===== mathlib.c ===== */
/* This file provides the DEFINITIONS (actual code) */
#include "mathlib.h"  /* include our own header */

int add(int a, int b) {
    return a + b;
}

int multiply(int a, int b) {
    return a * b;
}

double power(double base, int exp) {
    double result = 1.0;
    for (int i = 0; i < exp; i++) {
        result *= base;
    }
    return result;
}


/* ===== main.c ===== */
#include <stdio.h>
#include "mathlib.h"  /* get declarations so compiler checks our calls */

int main(void) {
    printf("add(3, 4)        = %d\n",   add(3, 4));
    printf("multiply(6, 7)   = %d\n",   multiply(6, 7));
    printf("power(2.0, 10)   = %.0f\n", power(2.0, 10));
    return 0;
}
Build and run
$ gcc -c mathlib.c # produces mathlib.o
$ gcc -c main.c # produces main.o
$ gcc main.o mathlib.o -o calc # link both .o files
$ ./calc
add(3, 4) = 7
multiply(6, 7) = 42
power(2.0, 10) = 1024
Example 2 — Static library: building and using libmathlib.a Week 6 Lecture
# Build object files for library components
gcc -c mathlib.c -o mathlib.o
gcc -c strutils.c -o strutils.o

# Create static library archive (convention: name starts with 'lib', ends with '.a')
ar rcs libmathlib.a mathlib.o strutils.o
# r = insert/replace members
# c = create archive if it doesn't exist
# s = write an object-file index (speeds up linking)

# Inspect the archive
ar t libmathlib.a
# Output:
# mathlib.o
# strutils.o

# Link the library into a program
# -L.        = look for libraries in the current directory (.)
# -lmathlib  = link against libmathlib.a (strips 'lib' prefix and '.a' suffix)
gcc -o calc main.c -L. -lmathlib

# Run the program
./calc

# View symbols inside the library
nm libmathlib.a
# T = defined symbol (Text section = code)
# U = undefined symbol (must be resolved by the linker)
# Output includes lines like:
# 0000000000000000 T add
# 0000000000000020 T multiply
Note
Static library: the code from libmathlib.a is physically copied into the 'calc' executable at link time. The .a file is not needed at runtime.
Example 3 — Dynamic shared library: libmathlib.so + LD_LIBRARY_PATH Week 6 Lecture
# Step 1: compile with -fpic (position-independent code — required for shared libs)
gcc -c -fpic mathlib.c -o mathlib.o

# Step 2: create shared library from the PIC object
gcc -shared -o libmathlib.so mathlib.o

# Step 3: link the program against the shared library
gcc -o calc main.c -L. -lmathlib

# Step 4a: run — if libmathlib.so is NOT in a standard library path, set env var
LD_LIBRARY_PATH=. ./calc

# Step 4b: alternatively, install to system library path (needs root) and refresh cache
# sudo cp libmathlib.so /usr/local/lib/
# sudo ldconfig

# Check what shared libraries 'calc' depends on
ldd ./calc
# Output (example):
#   linux-vdso.so.1 => (0x00007ffce6bfe000)
#   libmathlib.so => ./libmathlib.so (0x00007f4a3c123000)
#   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4a3bf00000)
Key difference from static linking
Dynamic: libmathlib.so must exist at runtime. The code is NOT copied into 'calc' — it is loaded by the dynamic linker (ld.so) when the program starts. Multiple programs can share one .so in memory, saving RAM.
Example 4 — Makefile for a multi-file project with a static library Week 6 + Week 23 Makefile patterns
CC      = gcc
CFLAGS  = -Wall -Wextra -g -I./include
LDFLAGS = -L./lib
LIBS    = -lmathlib

# Default target: build the executable
all: calc

# Link: combine main.o with the library
calc: main.o lib/libmathlib.a
	$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS) $(LIBS)

# Compile main.c to main.o
main.o: src/main.c include/mathlib.h
	$(CC) $(CFLAGS) -c src/main.c -o $@

# Compile library source files
lib/mathlib.o: src/mathlib.c include/mathlib.h
	$(CC) $(CFLAGS) -c src/mathlib.c -o $@

# Build static library from object files
lib/libmathlib.a: lib/mathlib.o
	ar rcs $@ $^

# Clean all generated files
clean:
	rm -f main.o lib/*.o lib/libmathlib.a calc

.PHONY: all clean
Usage
$ make # builds everything
$ make clean # removes all generated files
$ make calc # builds only the executable target

Practice problems with solutions

P1 — What stage of compilation does each command invoke? Week 6 Lecture

For each command, name which stage(s) of the compilation pipeline run, and what file is produced:

gcc -E hello.c
gcc -S hello.c
gcc -c hello.c
gcc hello.c -o hello
gcc -E hello.c — runs the preprocessor only. Expands #include and #define. Output: hello.i (preprocessed C source, printed to stdout by default). No compilation, no object code.

gcc -S hello.c — runs preprocessor + compiler. Output: hello.s (assembly language). Stops before the assembler.

gcc -c hello.c — runs preprocessor + compiler + assembler. Output: hello.o (machine code object file). Stops before the linker. Symbols referencing other files remain as unresolved placeholders.

gcc hello.c -o hello — runs all four stages: preprocessor, compiler, assembler, linker. Output: executable named hello. Links against the C standard library automatically.
P2 — Spot the error: build command for a library project Week 6 Tutorial

You have main.c, a static library libutil.a in ./lib/, and headers in ./include/. The following build command produces an "undefined reference" error. Find and fix it:

gcc -I./include main.c -o myprog -lutil
gcc -I./include main.c -L./lib -lutil -o myprog
Missing -L./lib. Without it, the linker searches only the default system library paths (like /usr/lib) for libutil.a. It won't find your custom library in ./lib/, causing "undefined reference" for every function from that library.

Also note: -l flags should generally come after the object files that need them (GCC processes arguments left-to-right, and the linker resolves symbols from objects listed before libraries). The fix shows the correct order: source files, then -L, then -l.
P3 — Write a complete header file for a stack module Week 6 Tutorial

Write stack.h — a header file for a simple integer stack. It should include: header guards, a struct definition for the stack (capacity 100), and declarations for push, pop, peek, and is_empty.

/* stack.h */
#ifndef STACK_H       /* header guard — start */
#define STACK_H

#define STACK_CAPACITY 100

/* Stack data structure — defined in the header so users know its layout */
typedef struct {
    int data[STACK_CAPACITY];
    int top;           /* index of the top element (-1 if empty) */
} Stack;

/* Function declarations (extern is implicit for function prototypes) */
void stack_init(Stack *s);
int  stack_push(Stack *s, int value);  /* returns 1 on success, 0 if full */
int  stack_pop(Stack *s, int *out);    /* returns 1 on success, 0 if empty */
int  stack_peek(Stack *s, int *out);   /* returns 1 on success, 0 if empty */
int  stack_is_empty(Stack *s);         /* returns 1 if empty, 0 otherwise */

#endif /* STACK_H */  /* header guard — end */
Key points: Header guards prevent double-inclusion. The struct is defined in the header (not just declared) so users can allocate a Stack on the stack. Functions take a pointer (Stack *s) because C passes by value — we need to modify the caller's stack. The actual function bodies go in stack.c, which #include "stack.h".
P4 — static vs dynamic: which to choose? Week 6 Lecture

For each scenario, say whether you would prefer a static library (.a) or a dynamic library (.so) and explain why:

(a) A command-line tool that must run on systems where the library may not be installed.
(b) A set of utility functions shared by 20 server processes running simultaneously.
(c) A security library where you want to push bugfix updates without recompiling all consumers.
(d) A small embedded system with no dynamic linker.

(a) Static (.a) — the executable is self-contained. No runtime dependency on the library being installed on the target system. Portable and deployable as a single binary.

(b) Dynamic (.so) — all 20 processes share one copy of the library in memory (the OS maps the same .so pages into each process's address space). With a static library, each process would have its own copy of the code, wasting memory.

(c) Dynamic (.so) — you can replace the .so file with a patched version. All programs that dynamically load it will pick up the fix on their next launch without recompilation. With static linking, every consumer would need to be recompiled and redeployed.

(d) Static (.a) — embedded systems often lack a dynamic linker (ld.so). Static linking is the only option. Binary size is usually not a concern when targeting a specific hardware platform.
P5 — Trace the linking process: what does the linker do with these object files? Week 6 Lecture + Tutorial

You have three files. main.o has an undefined reference to foo and bar. foo.o defines foo and has an undefined reference to helper. util.o defines helper and bar.
What is the correct linker command? What happens if you only provide main.o and foo.o? What if you define bar in both foo.o and util.o?

gcc main.o foo.o util.o -o myprog
Correct command: All three object files must be provided. The linker resolves: main.o's foo from foo.o, main.o's bar from util.o, foo.o's helper from util.o. All symbols are satisfied.

If only main.o + foo.o: The linker finds foo (in foo.o), but bar and helper remain unresolved. Error: "undefined reference to 'bar'" and "undefined reference to 'helper'".

If bar is defined in both foo.o and util.o: Error: "multiple definition of 'bar'". Each symbol may only be defined once across all linked object files. This is the One Definition Rule. Solution: remove the duplicate definition, or put one copy in a header as static inline (but that's unusual — just fix the duplication).

Key concepts to memorize

Card 1 of 10
Question — click to flip
Answer
Click card to flip • Use buttons to navigate

Test your understanding

Topic 25 Quiz — Linking & Libraries Score: 0 / 6
1
Which gcc flag compiles a source file to an object file (.o) without linking?LO1
multiple choice
2
True or False: A static library (.a) must be present at runtime for the program to execute.LO1
true / false
3
You create a static library called libmathutils.a. What is the correct gcc flag to link against it?LO1
multiple choice
4
Fill in the blank: Header guards use #ifndef, #define, and ___ to prevent double inclusion.LO1
fill in the blank
5
Spot the bug: the following build command produces "undefined reference to 'sqrt'". Why?LO1
gcc main.c -o myprog
/* main.c includes math.h and calls sqrt() */
spot the bug — multiple choice
6
Which command shows the dynamic libraries (shared objects) a compiled program depends on at runtime?LO1
multiple choice
0/6
Quiz complete!