Security Vulnerabilities
Buffer overflows, stack smashing, format string attacks, integer overflow, use-after-free — and the defenses that stop them: stack canaries, ASLR, NX bit, and safe string functions.
How C programs get exploited — and why
Imagine a bookshelf with exactly five slots. You put books in slots 1 to 5 — that is fine. Now imagine you try to push a sixth book in: it falls off the end of the shelf and lands on whatever is below it, smashing something important. A buffer overflow is exactly this: a fixed-size array on the stack has N bytes allocated. If you write more than N bytes into it, the excess bytes spill over into adjacent memory — overwriting other variables, saved registers, or even the return address that tells the program where to go next after the function returns. An attacker who controls what spills over controls where the program jumps.
Why does C let this happen? C is a low-level language that gives you direct access to memory. Functions like gets(), strcpy(), and scanf("%s") copy bytes into a buffer without checking how big the buffer is. There is no automatic bounds checking. The C philosophy is "trust the programmer to know what they are doing" — a philosophy that has produced decades of security vulnerabilities in real software.
Stack-based buffer overflow and return address overwrite. Every function call pushes a stack frame onto the call stack. That frame contains local variables (including buffers), the saved frame pointer, and the saved return address — the address the CPU will jump to when the function returns. If a local buffer overflows far enough, it overwrites the saved return address. When the function returns, the CPU jumps to the attacker's chosen address instead of the legitimate caller. This is called stack smashing.
Format string vulnerabilities. printf is a powerful function that interprets a format string to decide what to print. When user-controlled data is used directly as the format string — printf(user_input) instead of printf("%s", user_input) — the attacker can embed format specifiers like %x (read memory as hex) or %n (write the number of bytes printed so far to an address). The format string is a double-edged sword: incredibly useful for legitimate formatting, but a serious attack vector when misused.
Integer overflow. C integers have fixed sizes: a uint8_t holds 0–255. If you add 1 to 255, it wraps to 0. If code uses the result of that addition to calculate a buffer size or array index, the check may pass while the allocation is far too small, leading to an overflow downstream. The CERT SSH1 exploit (CVE-2001) used exactly this: a CRC check computed n + k in a small integer type, overflow caused a zero-sized allocation, then the code wrote data into that buffer — game over.
Use-after-free. After calling free(ptr), the memory at ptr is returned to the allocator. Using ptr again after freeing it is undefined behavior — the memory may have been reallocated for a completely different purpose. An attacker who can influence what data gets placed in the recycled memory can manipulate program state through the dangling pointer. Best practice: always set pointers to NULL after freeing.
Defense Mechanisms Overview
A random secret value ("canary") placed on the stack between the local buffer and the saved return address. Before the function returns, the compiler inserts a check: if the canary has been modified, an attacker has overflowed the buffer and the program is terminated immediately. Enabled with -fstack-protector in GCC.
Address Space Layout Randomization randomizes the base addresses of the stack, heap, and shared libraries every time a program runs. An attacker who needs to jump to a specific address (e.g., injected shellcode or a libc function) cannot predict where that address will be, making exploitation much harder without a separate information-leak vulnerability.
The No-Execute bit marks memory pages (such as the stack and heap) as non-executable. Even if an attacker injects shellcode into a buffer, the CPU refuses to execute it. Modern CPUs and operating systems support this via the DEP (Data Execution Prevention) feature. Attackers work around it using ROP (Return-Oriented Programming), chaining existing code gadgets.
Replace unsafe functions with size-limited alternatives: gets() → fgets(buf, n, stdin); strcpy(dst, src) → strncpy(dst, src, n-1) followed by null termination; sprintf(buf, fmt, ...) → snprintf(buf, n, fmt, ...). Always pass the buffer size. Always null-terminate after strncpy.
Stack canaries do not stop heap overflows. ASLR can be bypassed if there is an information leak. NX is defeated by ROP chains. Safe string functions only help if used correctly. Defense in depth — combining all of these — is the industry standard. And all defenses are undermined if the logic of the program is wrong.
Think of memory as a precisely laid-out map. Any write that goes outside its designated region corrupts the map. Defenses either detect the corruption (canary), make the map unpredictable (ASLR), make regions non-navigable (NX), or prevent the overflow from happening in the first place (safe functions). The best C programmers apply all four layers.
Vulnerable patterns and safe replacements — annotated
gets() — The Most Dangerous Function in C
/* VULNERABLE — gets() reads until newline with NO size limit */ char buf[64]; gets(buf); /* ^ └── DANGER: no size argument. If input > 63 bytes, bytes 64+ overflow into adjacent stack memory. gets() was removed from the C11 standard entirely. */ /* SAFE — fgets() takes an explicit size limit */ fgets(buf, sizeof(buf), stdin); /* ^ ^ ^ ^ | | | └── source stream | | └────────── max bytes to read (incl. \0) | └──────────────────────── destination buffer └────────────────────────────── safe: stops at size-1 chars */
warning: 'gets' is deprecated. Always use fgets(). Note that fgets() includes the newline character in the buffer if there is room — strip it with buf[strcspn(buf, "\n")] = '\0'; if needed.
strcpy() vs strncpy()
/* VULNERABLE — strcpy copies until \0, no bounds check */ char dst[8]; strcpy(dst, src); /* If strlen(src) >= 8, bytes overflow into adjacent memory */ /* SAFE — strncpy limits the copy to n-1 chars */ strncpy(dst, src, sizeof(dst) - 1); dst[sizeof(dst) - 1] = '\0'; /* MUST manually null-terminate! */ /* ^ ^ | └── always null-terminate after strncpy — | strncpy does NOT guarantee a \0 if src is long └──────────────── size - 1 leaves room for the \0 we add */
strncpy does not null-terminate the destination if the source is longer than n. You must add dst[n-1] = '\0' yourself. Forgetting this turns a "safe" function into a time bomb that causes string operations to run off the end of the buffer.
printf() Format String Vulnerability
/* VULNERABLE — user controls the format string */ char user_input[128]; fgets(user_input, sizeof(user_input), stdin); printf(user_input); /* ^ └── DANGER: if user types "%x %x %x", printf reads values off the stack and prints them as hex. If user types "%n", printf WRITES to memory. */ /* SAFE — use a fixed format string, pass data as argument */ printf("%s", user_input); /* ^ ^ | └── user data goes here — treated as plain string └─────── format string is a string LITERAL, not user data */ /* What %n does — WRITE attack */ int written; printf("hello%n", &written); /* %n stores the count of characters printed SO FAR (5 for "hello") into the int pointed to by the next argument. In a format string attack, the attacker provides the argument address by embedding values in the format string itself. */
snprintf() — Safe Formatted Output to a Buffer
/* VULNERABLE — sprintf has no size limit */ char out[32]; sprintf(out, "Hello, %s!", username); /* If username is longer than ~21 chars, out overflows */ /* SAFE — snprintf limits output to n-1 chars + null byte */ snprintf(out, sizeof(out), "Hello, %s!", username); /* ^ ^ | └── format string + args (user data is safe HERE) └─────────────── max bytes to write including null terminator */
snprintf always null-terminates the output (unlike strncpy). It returns the number of characters that would have been written if the buffer were large enough — if that value is >= sizeof(out), the output was truncated. Check the return value in security-critical code.
Safe vs Unsafe Functions — Quick Reference
| Unsafe Function | Problem | Safe Replacement | Key Parameter |
|---|---|---|---|
gets(buf) |
No size limit at all. Removed from C11. | fgets(buf, n, stdin) |
n = buffer size |
strcpy(dst, src) |
No bounds check; overflows if src is too long. | strncpy(dst, src, n-1); dst[n-1]='\0'; |
n = sizeof(dst) |
strcat(dst, src) |
No bounds check on destination. | strncat(dst, src, n - strlen(dst) - 1) |
n = sizeof(dst) |
sprintf(buf, fmt, ...) |
No output size limit. | snprintf(buf, n, fmt, ...) |
n = buffer size |
scanf("%s", buf) |
No width limit — overflow possible. | scanf("%19s", buf) or fgets() |
width = n-1 |
printf(user_input) |
Format string attack via %x, %n. | printf("%s", user_input) |
Fix format string |
Real-World Disasters — Type Safety Failures
Cost: ~$370 million + payload. Rocket exploded 37 seconds after launch.
Root cause: A 64-bit floating-point value (horizontal velocity) was cast to a 16-bit signed integer. On the older Ariane 4, this value never exceeded 16-bit range. Ariane 5 was faster — it did. The value overflowed. Software raised a hardware exception. The inertial reference system shut down. The flight computer received garbage data and steered the rocket off course.
int16_t v = (int16_t)velocity; /* ← overflow → exception */
Lesson: NEVER assume reused code handles all value ranges. Reuse without revalidation of preconditions is a type safety disaster.
Cost: 28 US soldiers killed. Failed to intercept an Iraqi Scud in Dhahran, Saudi Arabia.
Root cause: The system clock counted time in units of 1/10 second using a 24-bit fixed-point register. Each tick was stored as 0.1 in binary — which cannot be represented exactly (like 1/3 in decimal). After 100 hours of operation, accumulated rounding error was 0.34 seconds. A Scud travelling at Mach 5 moves ~500 metres in 0.34 seconds — enough to leave the intercept window entirely.
/* After 360,000 ticks: error ≈ 0.34 seconds */
Lesson: Floating-point rounding errors accumulate — never use floats for long-running clocks. Use integer arithmetic with a known unit (e.g., microseconds as uint64_t).
The -O2 Overflow Optimization Trap
Compiler optimizations interact with undefined behaviour in counterintuitive ways. Signed integer overflow is undefined behaviour in C. The compiler assumes it can never happen — and optimizes accordingly.
/* DEVELOPER'S INTENT: guard against buffer overflow if ptr+len wraps around */ void copy_data(char *ptr, size_t len, char *dest) { if (ptr + len < ptr) { /* ← "overflow check" */ return; /* SILENTLY REMOVED by gcc -O2 ! */ } memcpy(dest, ptr, len); }
len. memcpy copies whatever length they choose.
The fix — three options:
/* Option 1: check BEFORE adding (no overflow possible) */ if (len > (SIZE_MAX - (uintptr_t)ptr)) { return; } /* Option 2: use compiler built-in (GCC/Clang) */ uintptr_t result; if (__builtin_add_overflow((uintptr_t)ptr, len, &result)) { return; } /* Option 3: compile with -fwrapv (treat signed overflow as wrapping) */ /* gcc -O2 -fwrapv — disables some optimizations but makes overflow defined */
add + jo). Option 3 is a build-system setting that makes the whole translation unit safe — useful for legacy code but may prevent some legitimate optimizations.
Always compile with -Wall -Wextra AND test with both -O0 (debug) and -O2 (release) builds. A bug that only appears at one optimization level is a sign you have undefined behaviour in your code — the compiler is legally allowed to exploit it in any way it chooses. Sanitizers (-fsanitize=undefined) catch these at runtime.
Complete vulnerable programs — and how to fix them
#include <stdio.h>
#include <string.h>
/* Simulates a login check — in reality, any function with a local buffer */
void greet_user(const char *name) {
char buf[16]; /* only 16 bytes on the stack */
strcpy(buf, name); /* NO size check — copies until \0 */
printf("Hello, %s!\n", buf);
}
int main(void) {
/* Safe call — 5 bytes + \0 fits in 16 */
greet_user("Alice");
/* OVERFLOW — 40-char string smashes the stack */
/* The 24+ extra bytes overwrite buf's neighbors:
saved frame pointer, then saved return address */
greet_user("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
return 0;
}
Hello, AAAAAAA... [then crash / segfault / control hijack]
Segmentation fault (core dumped)
Stack layout during greet_user() — visualising the overflow:
When strcpy writes byte 17, it overflows past buf[15] and starts corrupting the saved frame pointer. By byte 25, it reaches the saved return address. The attacker fills that position with the address of their malicious code.
#include <stdio.h>
#include <string.h>
void greet_user(const char *name) {
char buf[16];
/* strncpy copies at most sizeof(buf)-1 characters */
strncpy(buf, name, sizeof(buf) - 1);
/* strncpy does NOT null-terminate if source is too long — do it manually */
buf[sizeof(buf) - 1] = '\0';
printf("Hello, %s!\n", buf);
}
int main(void) {
greet_user("Alice");
/* Now safe — long input is silently truncated to 15 chars */
greet_user("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
return 0;
}
Hello, AAAAAAAAAAAAAAA! (truncated to 15 chars, no crash)
#include <stdio.h>
int secret = 0xDEADBEEF; /* sensitive value an attacker wants to read */
void handle_input(char *user_input) {
/* BUG: user_input is used directly as the format string */
printf(user_input); /* NEVER DO THIS */
printf("\n");
}
int main(void) {
char input[128];
fgets(input, sizeof(input), stdin);
/* Strip newline */
input[strcspn(input, "\n")] = '\0';
handle_input(input);
return 0;
}
(stack values leaked as hex — attacker can see secret = 0xdeadbeef)
printf(user_input) interprets user_input as a format string. When the attacker passes %x %x %x %x %x %x, printf looks for the corresponding arguments on the stack. But there are no arguments — so printf reads whatever values happen to be on the stack at that moment. This leaks local variables, saved registers, addresses, and any sensitive data nearby. The secret variable shows up as deadbeef because it is stored in a location that printf stumbles across.The %n escalation: By using
%n instead of %x, the attacker can not only read memory but write arbitrary values to arbitrary addresses — turning a read vulnerability into a write-what-where primitive. This can overwrite the return address, a function pointer, or a GOT entry to redirect execution.Fix: Always use
printf("%s", user_input). The format string must be a string literal controlled by the programmer, never user data.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
/* Attacker-controlled: count comes from network packet */
void process_items(uint16_t count, char *data, size_t data_len) {
/* VULNERABILITY: count * 4 can overflow uint16_t
Example: count = 16385 (0x4001)
16385 * 4 = 65540 → overflows uint16_t to 4 (0x0004) */
uint16_t buf_size = count * 4; /* integer overflow here */
char *buf = malloc(buf_size); /* allocates only 4 bytes! */
if (!buf) return;
/* Now copies data_len bytes into a 4-byte buffer */
memcpy(buf, data, data_len); /* heap overflow → arbitrary write */
free(buf);
}
/* SAFE VERSION */
void process_items_safe(uint16_t count, char *data, size_t data_len) {
/* Use size_t (which is at least 32-bit) to avoid overflow */
size_t buf_size = (size_t)count * 4;
if (buf_size == 0 || data_len > buf_size) return; /* bounds check */
char *buf = malloc(buf_size);
if (!buf) return;
memcpy(buf, data, data_len);
free(buf);
}
int main(void) {
printf("uint16_t max: %u\n", (uint16_t)65535);
printf("16385 * 4 as uint16_t: %u\n", (uint16_t)(16385 * 4));
/* Shows: 4 — the overflow */
return 0;
}
16385 * 4 as uint16_t: 4 (overflowed! attacker gets 4-byte heap allocation)
count = 16385. The multiplication 16385 * 4 = 65540 exceeds the maximum value of uint16_t (65535), so it wraps around to 65540 - 65536 = 4. The program allocates only 4 bytes on the heap, then faithfully copies the attacker's full data (say, 60000 bytes) into those 4 bytes — a massive heap buffer overflow. This is how the SSH1 CRC exploit worked.Fix: Use
size_t for buffer sizes (it is at least the pointer width, so it does not overflow in practice for reasonable inputs). Add a pre-multiplication overflow check: if (count > SIZE_MAX / 4) return;. And always check that the data length fits in the allocated buffer before copying.
Practice problems with solutions
For each code fragment below, name the vulnerability and explain in one sentence why it is dangerous.
/* Fragment A */
char username[32];
gets(username);
/* Fragment B */
char buf[64];
printf(buf);
/* Fragment C */
void copy_name(const char *src) {
char dst[16];
strcpy(dst, src);
}
/* Fragment D */
uint8_t n = user_provided_value;
char *p = malloc(n * 256);
memcpy(p, data, 256 * 256);
gets() reads until a newline with no size limit — any input longer than 31 bytes overflows username into adjacent stack memory, potentially overwriting the return address.Fragment B: Format string vulnerability.
buf is user-controlled data being passed directly as the format string to printf. An attacker who types %x %x %x leaks stack values; %n allows writing to an arbitrary memory address.Fragment C: Stack buffer overflow via strcpy().
strcpy copies until null terminator with no bounds check on the destination. If src is longer than 15 bytes, it overflows the 16-byte dst buffer.Fragment D: Integer overflow leading to heap overflow.
uint8_t n holds 0–255. If an attacker provides n = 1, then n * 256 = 256. But if n = 0, then n * 256 = 0 and malloc(0) may return a small pointer or NULL, after which memcpy(p, data, 65536) overflows massively. Even with non-zero n, the attacker can tune n to produce an allocation far smaller than the 256*256 = 65536 bytes copied.
The function below has two security vulnerabilities. Identify them and rewrite the function so it is safe. The function should still work correctly for valid inputs.
#include <stdio.h>
#include <string.h>
void log_message(const char *user_name, const char *message) {
char log_buf[64];
/* Build log entry */
sprintf(log_buf, "[%s] %s", user_name, message);
/* Print to stderr */
fprintf(stderr, log_buf);
fprintf(stderr, "\n");
}
#include <stdio.h>
#include <string.h>
void log_message(const char *user_name, const char *message) {
char log_buf[64];
/* FIX 1: sprintf → snprintf with explicit buffer size */
/* If user_name + message together exceed 60 chars, output is truncated */
snprintf(log_buf, sizeof(log_buf), "[%s] %s", user_name, message);
/* FIX 2: fprintf(stderr, log_buf) → fprintf(stderr, "%s", log_buf) */
/* log_buf might contain % characters from user_name/message;
using it as format string would be a format string vulnerability */
fprintf(stderr, "%s\n", log_buf);
}
sprintf has no size limit. If user_name + message + 4 overhead characters exceed 63 bytes, the output overflows log_buf. Fix: use snprintf(log_buf, sizeof(log_buf), ...).Vulnerability 2 — format string:
fprintf(stderr, log_buf) uses a runtime string as the format string. If user_name or message contains %x, %n, etc., those specifiers are interpreted. Fix: use fprintf(stderr, "%s\n", log_buf) — log_buf is now just the data argument, not the format string.
In a format string passed to printf, what does the %n specifier do? Give a concrete example of its legitimate use and explain how an attacker exploits it in a format string vulnerability. What is the name of the corresponding read-only specifier that leaks memory?
/* Legitimate use of %n — counting printed characters */
int count;
printf("Hello, world%n\n", &count);
/* After this call, count == 12 (characters printed before %n) */
printf("Printed %d chars before the newline\n", count);
/* Attacker's use (format string vulnerability) */
/* If printf(user_input) is called and user_input contains:
"AAAA%10$n"
This writes the value 4 (for "AAAA") to the address
found at position 10 on the stack — wherever that points.
With careful crafting, the attacker points it at the
return address or a function pointer. */
%n takes a pointer to int as its corresponding argument and writes the number of characters that printf has printed so far (before reaching %n) into that integer. It is useful for precisely measuring output length in legitimate code.Why it is dangerous: In a format string attack, the attacker controls the format string. By embedding values that look like pointers (e.g., an address they want to overwrite) in the string and then using
%n to write to that address, the attacker gains an arbitrary memory write primitive. They can write any value by using width specifiers (%100x%n writes 100+previous) to control what number gets written.The read-only companion:
%x reads a value off the stack as unsigned hex — it is the primary specifier for reading arbitrary stack memory. %p reads an address. Together, %x and %n provide a read-write primitive through a format string bug.
An application is vulnerable to a stack-based buffer overflow. The developer wants to add runtime defenses without rewriting all the string-handling code. Name two distinct defenses, explain how each works, which layer (compiler, OS, hardware) provides it, and give one limitation of each.
-fstack-protector)How it works: The compiler inserts a random secret value (the "canary") between the local variables and the saved return address on the stack. Before every
return, a check verifies the canary is unchanged. If a buffer overflow overwrote the canary, the check fails and the program calls __stack_chk_fail(), printing "stack smashing detected" and aborting.Limitation: Only protects the stack. Does not stop heap overflows, off-by-one writes that skip the canary, or attacks that can first read the canary value via an information leak (then replicate it in the overflow payload).
Defense 2: ASLR — Address Space Layout Randomization (OS layer)
How it works: Every time the program starts, the OS loads the stack, heap, and shared libraries at randomized base addresses. An attacker who needs to jump to a specific address (e.g.,
system() in libc or injected shellcode on the stack) cannot predict the address, so the jump lands in garbage and the program crashes rather than executing the attacker's payload.Limitation: If the program has a separate information-leak vulnerability (e.g., a format string bug that reveals a stack or library address), the attacker can de-randomize the layout and compute correct addresses. Also, 32-bit systems have limited entropy (only ~16 bits of randomness), making brute-force feasible on some platforms.
The code below contains a use-after-free bug. Identify the exact line where the free-then-use occurs, explain what can go wrong, and provide a corrected version.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct { char name[32]; int score; } Player;
Player *create_player(const char *name, int score) {
Player *p = malloc(sizeof(Player));
if (!p) return NULL;
strncpy(p->name, name, 31);
p->name[31] = '\0';
p->score = score;
return p;
}
int main(void) {
Player *p = create_player("Alice", 100);
printf("Player: %s, Score: %d\n", p->name, p->score);
free(p);
/* Later in the code, a different developer adds this: */
p->score += 50; /* bonus points */
printf("Updated score: %d\n", p->score);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct { char name[32]; int score; } Player;
Player *create_player(const char *name, int score) {
Player *p = malloc(sizeof(Player));
if (!p) return NULL;
strncpy(p->name, name, 31);
p->name[31] = '\0';
p->score = score;
return p;
}
int main(void) {
Player *p = create_player("Alice", 100);
printf("Player: %s, Score: %d\n", p->name, p->score);
/* CORRECT APPROACH: apply bonus BEFORE freeing */
p->score += 50;
printf("Updated score: %d\n", p->score);
free(p);
p = NULL; /* KEY: set to NULL immediately after free */
/* Any later dereference of p will segfault predictably
rather than silently corrupting a recycled allocation */
return 0;
}
p->score += 50 on line after free(p) is the use-after-free. After free(p), the memory at p is returned to the allocator. It may be immediately reused for a completely different allocation. Writing to it corrupts unrelated data. Reading from it returns garbage.Security implication: An attacker who can allocate an object between the free and the use controls what data the dangling pointer reads or writes. This is a classic heap exploitation technique used in browser and kernel exploits.
Best practices: (1) Apply all operations before freeing. (2) Set
p = NULL immediately after free(p). A NULL dereference crashes the program immediately and visibly rather than silently corrupting heap state. (3) Use tools like valgrind or AddressSanitizer (-fsanitize=address) to detect use-after-free at runtime during development.
Key concepts to memorize
Test your understanding — LO11
%n format specifier do when used in printf?LO11gets(buf) that accepts a size limit is _______(buf, sizeof(buf), stdin). Type the function name only.LO11char user_msg[64]; fgets(user_msg, sizeof(user_msg), stdin); printf(user_msg);
if (ptr + len < ptr) { return; } to guard against pointer wraparound. After compiling with gcc -O2, this check disappears from the binary. Why?LO11