Recursion & Aliasing
How functions call themselves to solve problems, how the call stack grows, when to use and avoid recursion, and the subtle dangers of pointer aliasing — two pointers pointing to the same memory — plus the restrict keyword that lets the compiler safely optimise around it.
Recursion and aliasing in plain English
Imagine a set of Russian nesting dolls (Matryoshka). To find the smallest doll inside, you open the current doll, and if there is another doll inside you open that one too — repeating the same action. You stop when you reach a doll that is solid (no more dolls inside). That stopping condition is the base case. Each "open this doll" step is the recursive case. If you never hit a solid doll, you open forever — a stack overflow in code terms.
A recursive function is a function that calls itself as part of its own definition. Every recursive function must have two distinct parts:
- Base case — the condition under which the function stops calling itself and returns directly. Without a base case, recursion runs forever and crashes the program with a stack overflow.
- Recursive case — the part where the function calls itself with a simpler version of the problem. Each recursive call must move the problem closer to the base case.
Recursion is most natural for problems that have a self-similar structure: tree traversal, directory listing, mathematical sequences, and divide-and-conquer algorithms. Use iteration (loops) when the algorithm is naturally sequential; use recursion when the data structure or problem is inherently hierarchical.
Every function call in C allocates a stack frame — a block of memory holding local variables, the return address, and function arguments. Recursive calls stack these frames one on top of another. If recursion goes too deep (typically thousands of frames), the stack exhausts available memory and the OS kills the program with a segfault or "stack overflow" error. Always ensure your base case will be reached with bounded input.
Tail recursion is a special form where the recursive call is the very last thing the function does — there is no work remaining after the call returns. Some compilers (with -O2 or -O3) can convert tail-recursive functions into loops automatically (tail-call optimisation, TCO), eliminating the stack growth entirely. GCC performs TCO on eligible C functions. To write a tail-recursive version you usually need an accumulator parameter to carry intermediate results.
Mutual recursion is where function A calls function B, and function B calls function A. For example, a parser for a grammar may have parse_expr call parse_term, which calls parse_expr for sub-expressions. Mutual recursion is perfectly valid in C as long as one of the functions is forward-declared before the other is defined.
Imagine "Bob Smith" and "Robert Smith" are the same person. If you write a note to Bob saying "get a haircut" and simultaneously write a note to Robert saying "grow your hair long", both notes apply to the same person — the instructions conflict and the result is unpredictable. Pointer aliasing works the same way: two pointers that hold the same address are two names for the same piece of memory. Writing through one affects what the other sees.
Pointer aliasing occurs when two or more pointers in a program refer to the same memory location. This is perfectly legal in C and often intentional (a linked-list node pointing back to itself, a function returning a pointer into its argument). However, aliasing causes serious problems in two scenarios:
- Undefined behaviour with overlapping buffers:
memcpyassumes source and destination do not overlap. If they do, the result is undefined. Usememmoveinstead — it handles overlapping regions correctly. - Missed compiler optimisations: The compiler cannot reorder memory reads and writes if it suspects aliasing. If it cannot prove two pointers are distinct, it must be conservative and reload from memory on every access, missing vectorisation and other optimisations.
The restrict keyword (C99 and later) is a promise you make to the compiler: "I guarantee that for the lifetime of this pointer, no other pointer in scope accesses the same memory." With this guarantee, the compiler can freely reorder accesses, keep values in registers, and apply SIMD optimisations. If you lie — if aliasing actually does occur with a restrict-qualified pointer — the behaviour is undefined. restrict appears frequently in the C standard library signatures, for example memcpy(void * restrict dest, const void * restrict src, size_t n).
Aliasing matters most in tight numeric loops that process arrays — the kind of code that benefits from auto-vectorisation (SSE/AVX). Without restrict, the compiler inserts memory fence instructions to be safe. With restrict, it can load once, compute in registers, and write once. For most application code the difference is invisible; for HPC and DSP kernels it can be a 4–8x speedup.
Annotated C code: recursion and aliasing constructs
Recursive function structure
/* General template for a recursive function */ int recurse(int n) { /* ^ ^ ^ | | └── parameter that changes on each call (gets closer to base case) | └─────────── function name — this same name appears in the body └──────────────── return type must match the value returned by base case AND recursive case */ if (n == 0) { return 1; /* BASE CASE — stop here, no recursive call */ } /* ^ └── must be FIRST (or at least checked before the recursive call) must be reachable for every valid input must not call recurse() again */ return n * recurse(n - 1); /* ^ ^ ^ | | └── argument is SMALLER than n — progress toward base case | └──────────── recursive call to itself └──────────────── combines result of sub-problem with current value */ }
Tail-recursive form with accumulator
/* Non-tail-recursive factorial — stack frame kept alive waiting for result */ int fact(int n) { if (n == 0) return 1; return n * fact(n-1); /* must wait for fact(n-1) before multiplying */ } /* Tail-recursive version — accumulator carries the running product */ int fact_tail(int n, int acc) { /* ^^^ accumulator holds partial result — starts at 1 */ if (n == 0) return acc; /* base case returns accumulated result */ return fact_tail(n-1, n*acc); /* TAIL CALL — last operation, nothing left to do */ } /* Call as: fact_tail(5, 1) — with GCC -O2, this compiles to a loop */
n * ... multiplication waiting. GCC/Clang can replace the call+return with a jump back to the function start, using constant stack space.
Pointer aliasing example
int x = 10; int *p = &x; /* p points to x */ int *q = &x; /* q ALSO points to x — this is aliasing */ /* ^ both p and q hold the same address writing through p changes what q reads */ *p = 42; /* Now *q is also 42 even though we never wrote to *q */ printf("%d\n", *q); /* prints 42 */ /* Aliasing with overlapping buffers — DANGEROUS with memcpy */ char buf[10] = "hello"; memcpy(buf+1, buf, 5); /* UNDEFINED — src and dst overlap! */ memmove(buf+1, buf, 5); /* SAFE — memmove handles overlap */
memmove instead of memcpy. memcpy has undefined behaviour when regions alias (the C standard says it requires non-overlapping regions).
The restrict keyword
/* WITHOUT restrict — compiler must assume p and q may alias */ void add_arrays(int *dst, const int *src, int n) { for (int i = 0; i < n; i++) dst[i] += src[i]; /* if dst == src+1, every iteration sees a modified src! */ } /* WITH restrict — compiler may vectorise freely */ void add_arrays_r(int * restrict dst, /* ^^^^^^^ promise: no other pointer in this scope aliases dst */ const int * restrict src, /* ^^^^^^^ promise: src does not overlap dst */ int n) { for (int i = 0; i < n; i++) dst[i] += src[i/* compiler can vectorise: load 8 ints at once with SSE */ } /* Standard library uses restrict: */ /* void *memcpy(void * restrict dest, const void * restrict src, size_t n); */ /* char *strcpy(char * restrict dst, const char * restrict src); */
int * restrict p — the keyword goes between the * and the variable name. It is a qualifier on the pointer, not on the pointed-to type. Only applies for the duration of the pointer's scope (usually a function parameter's lifetime).
Complete C programs with stack traces and output
#include <stdio.h>
/* Compute n! (n factorial) recursively.
Base case: fact(0) = 1
Recursive case: fact(n) = n * fact(n-1) for n > 0 */
int fact(int n) {
printf(" fact(%d) called\n", n); /* trace the call */
if (n == 0) { /* BASE CASE */
printf(" fact(0) = 1 [base case]\n");
return 1;
}
int sub = fact(n - 1); /* RECURSIVE CASE: call with n-1 */
int result = n * sub;
printf(" fact(%d) = %d * %d = %d [returning]\n", n, n, sub, result);
return result;
}
int main(void) {
printf("Computing fact(4):\n");
int answer = fact(4);
printf("Result: %d\n", answer); /* 24 */
return 0;
}
fact(4) called
fact(3) called
fact(2) called
fact(1) called
fact(0) called
fact(0) = 1 [base case]
fact(1) = 1 * 1 = 1 [returning]
fact(2) = 2 * 1 = 2 [returning]
fact(3) = 3 * 2 = 6 [returning]
fact(4) = 4 * 6 = 24 [returning]
Result: 24
The call stack at its deepest point (just before fact(0) returns) contains five live frames:
#include <stdio.h>
#include <string.h>
/* ──── Part A: Fibonacci — naive recursive (exponential calls!) ──── */
int fib_naive(int n) {
if (n <= 1) return n; /* base cases: fib(0)=0, fib(1)=1 */
return fib_naive(n-1) + fib_naive(n-2); /* TWO recursive calls */
/* Problem: fib_naive(5) calls fib_naive(4) AND fib_naive(3).
fib_naive(4) ALSO calls fib_naive(3). fib(3) is computed TWICE.
For fib(40) this recalculates the same values ~100 million times. */
}
/* ──── Part B: Fibonacci — tail-recursive with accumulator ──── */
/* fib_tail(n, a, b): a = fib(k), b = fib(k+1), working toward n */
int fib_tail(int n, int a, int b) {
if (n == 0) return a; /* base case: accumulated result */
return fib_tail(n-1, b, a+b); /* tail call — no work after return */
}
/* Call as: fib_tail(10, 0, 1) */
/* ──── Part C: aliasing bug with memcpy ──── */
void demonstrate_aliasing(void) {
char s[20] = "hello";
/* Bug: src (s) and dst (s+1) overlap — undefined behaviour */
/* memcpy(s+1, s, 5); */ /* DO NOT USE — overlapping regions */
/* Fix: use memmove — it copies to a temp buffer first */
memmove(s+1, s, 5); /* result: "hhello" (safely shifted) */
s[6] = '\0';
printf("After memmove shift: %s\n", s); /* hhello */
/* Another aliasing example: two pointers to the same int */
int val = 99;
int *p = &val;
int *q = &val; /* p and q ALIAS the same location */
*p = 10;
printf("*q = %d (we wrote to *p, but *q changed too)\n", *q); /* 10 */
}
int main(void) {
printf("fib_naive(10) = %d\n", fib_naive(10)); /* 55 */
printf("fib_tail(10) = %d\n", fib_tail(10, 0, 1)); /* 55 */
demonstrate_aliasing();
return 0;
}
fib_tail(10) = 55
After memmove shift: hhello
*q = 10 (we wrote to *p, but *q changed too)
#include <stdio.h>
/* Forward declaration needed because is_even calls is_odd
which is defined AFTER is_even */
int is_odd(int n);
int is_even(int n) {
if (n == 0) return 1; /* base case: 0 is even */
return is_odd(n - 1); /* delegates to is_odd — mutual recursion */
}
int is_odd(int n) {
if (n == 0) return 0; /* base case: 0 is not odd */
return is_even(n - 1); /* delegates back to is_even */
}
/* Demonstrate restrict — no-alias promise enables vectorisation */
void vec_add(float * restrict dst,
const float * restrict a,
const float * restrict b,
int n) {
/* Compiler can now generate SIMD (SSE/AVX) instructions here because
it knows dst, a, and b do not overlap.
Without restrict, it must reload a[i] and b[i] each iteration
in case dst == a or dst == b. */
for (int i = 0; i < n; i++) {
dst[i] = a[i] + b[i];
}
}
int main(void) {
printf("is_even(4) = %d\n", is_even(4)); /* 1 (true) */
printf("is_odd(7) = %d\n", is_odd(7)); /* 1 (true) */
printf("is_even(3) = %d\n", is_even(3)); /* 0 (false) */
float x[4] = {1.0f, 2.0f, 3.0f, 4.0f};
float y[4] = {10.0f, 20.0f, 30.0f, 40.0f};
float z[4];
vec_add(z, x, y, 4);
printf("z = [%.0f, %.0f, %.0f, %.0f]\n", z[0], z[1], z[2], z[3]);
return 0;
}
is_odd(7) = 1
is_even(3) = 0
z = [11, 22, 33, 44]
Practice problems with solutions
Without running the code, trace every call made by sum_digits(523) and determine the final return value. Show each call and its return value step by step.
int sum_digits(int n) {
if (n == 0) return 0;
return (n % 10) + sum_digits(n / 10);
}
sum_digits(523)
= (523 % 10) + sum_digits(523 / 10)
= 3 + sum_digits(52)
sum_digits(52)
= (52 % 10) + sum_digits(52 / 10)
= 2 + sum_digits(5)
sum_digits(5)
= (5 % 10) + sum_digits(5 / 10)
= 5 + sum_digits(0)
sum_digits(0)
= 0 [base case]
Unwinding:
sum_digits(5) = 5 + 0 = 5
sum_digits(52) = 2 + 5 = 7
sum_digits(523) = 3 + 7 = 10
Final answer: 10
n % 10 extracts the last digit, n / 10 removes it (integer division). The function sums all digits: 5 + 2 + 3 = 10. The base case is n == 0 — when no digits remain. Note: this function gives incorrect results for negative numbers.
The following function is supposed to shift every character in buf one position to the right (inserting an 'X' at the start). It has an aliasing bug that causes undefined behaviour. Identify the bug and fix it.
#include <string.h>
#include <stdio.h>
void prepend_X(char *buf, size_t len) {
memcpy(buf + 1, buf, len); /* shift contents right by 1 */
buf[0] = 'X';
}
int main(void) {
char s[20] = "hello";
prepend_X(s, 5);
printf("%s\n", s); /* intended: "Xhello" */
return 0;
}
#include <string.h>
#include <stdio.h>
void prepend_X(char *buf, size_t len) {
memmove(buf + 1, buf, len); /* FIX: memmove handles overlapping regions */
buf[0] = 'X';
}
int main(void) {
char s[20] = "hello";
prepend_X(s, 5);
printf("%s\n", s); /* "Xhello" */
s[6] = '\0'; /* null-terminate the longer string */
return 0;
}
memcpy(buf+1, buf, 5) — the source region [buf, buf+5) and destination region [buf+1, buf+6) overlap by 4 bytes. The C standard (C11 7.1.4p1 and the memcpy spec) explicitly states that memcpy's regions must not overlap. The behaviour is undefined — on some implementations it "works" by accident; on others it corrupts the data.The fix: Replace
memcpy with memmove. memmove detects overlap and either copies via a temporary buffer or reverses the copy direction to handle overlapping safely. Also need to null-terminate the result since we now have 6 characters.
The following iterative function computes the sum of all integers from 1 to n. Rewrite it as a recursive function sum_recursive(int n). Identify the base case and recursive case clearly in comments.
int sum_iterative(int n) {
int total = 0;
for (int i = 1; i <= n; i++) {
total += i;
}
return total;
}
int sum_recursive(int n) {
/* BASE CASE: sum of integers from 1 to 0 is 0 */
if (n <= 0) return 0;
/* RECURSIVE CASE: sum(1..n) = n + sum(1..n-1)
The argument n-1 is strictly smaller than n,
so we make progress toward the base case on every call. */
return n + sum_recursive(n - 1);
}
/* Alternatively, a tail-recursive version with accumulator:
int sum_tail(int n, int acc) {
if (n <= 0) return acc;
return sum_tail(n - 1, acc + n);
}
// Call as: sum_tail(100, 0)
*/
total += i for i from 1 to n. Recursively: summing 1 to n equals n plus summing 1 to n-1. The base case is when n is 0 (or negative), where the answer is 0. The function makes progress because each call reduces n by 1. For large n (e.g. n=100000), the non-tail version will overflow the stack — the tail-recursive version avoids that with GCC -O2.
The following function scales an array by a constant. Add restrict qualifiers where appropriate so the compiler can optimise the loop. Explain why each one is valid (or why you chose not to add one).
void scale(float *dst, const float *src, float factor, int n) {
for (int i = 0; i < n; i++) {
dst[i] = src[i] * factor;
}
}
void scale(float * restrict dst,
const float * restrict src,
float factor,
int n) {
for (int i = 0; i < n; i++) {
dst[i] = src[i] * factor;
}
}
/* Why restrict on dst?
We promise: no other pointer passed to this function aliases dst.
So the compiler knows that writing dst[i] does not invalidate src[j].
Without restrict, after writing dst[0], the compiler would have to
reload src[1] from memory in case dst == src+0 made them alias.
Why restrict on src?
Same reason from the src side. Together they allow full vectorisation.
Why NOT restrict on factor (a float, not a pointer)?
restrict only applies to pointers. Scalar values are passed by value
and cannot alias. */
restrict when you can guarantee the caller never passes overlapping regions. If a caller does pass overlapping arrays, the behaviour becomes undefined — the compiler may produce wrong results silently. Always document the precondition in a comment above the function.
Write a recursive function int bsearch_r(int *arr, int lo, int hi, int target) that performs binary search on a sorted array. It should return the index of target if found, or -1 if not found. The base case occurs when lo > hi (empty range).
#include <stdio.h>
/* Binary search: search arr[lo..hi] for target.
Precondition: arr is sorted in ascending order.
Returns: index of target, or -1 if not found. */
int bsearch_r(int *arr, int lo, int hi, int target) {
/* BASE CASE 1: empty range — target not in array */
if (lo > hi) return -1;
int mid = lo + (hi - lo) / 2; /* avoid integer overflow vs. (lo+hi)/2 */
/* BASE CASE 2: found the target */
if (arr[mid] == target) return mid;
/* RECURSIVE CASES: search the appropriate half */
if (target < arr[mid]) {
return bsearch_r(arr, lo, mid - 1, target); /* search left half */
} else {
return bsearch_r(arr, mid + 1, hi, target); /* search right half */
}
}
int main(void) {
int sorted[] = {2, 5, 8, 12, 16, 23, 38, 56, 72, 91};
int n = 10;
printf("Index of 23: %d\n", bsearch_r(sorted, 0, n-1, 23)); /* 5 */
printf("Index of 99: %d\n", bsearch_r(sorted, 0, n-1, 99)); /* -1 */
printf("Index of 2: %d\n", bsearch_r(sorted, 0, n-1, 2)); /* 0 */
return 0;
}
lo + (hi - lo) / 2 avoids integer overflow that (lo + hi) / 2 could cause with large indices.
The code below is intended to count how many times digit d appears in the non-negative integer n. It has a bug that causes infinite recursion for certain inputs. Find the bug and fix it.
int count_digit(int n, int d) {
if (n == 0) return 0; /* base case */
return (n % 10 == d) + count_digit(n / 10, d);
}
int main(void) {
printf("%d\n", count_digit(303, 3)); /* should print 2 */
printf("%d\n", count_digit(0, 0)); /* should print 1, but crashes! */
}
/* Bug: count_digit(0, 0) returns 0 immediately (hits base case n==0)
but the digit 0 IS present in the number 0 — the answer should be 1.
Worse: count_digit(-1, d) causes infinite recursion because -1/10 == 0
and -1 % 10 == -1 on most platforms, but n never reaches 0 for
large negatives.
Fix: use a sentinel to distinguish "called with 0 as a number" from
"exhausted all digits". One standard approach: */
int count_digit_fixed(int n, int d) {
/* Process at least one digit even when n == 0 */
int count = (n % 10 == d) ? 1 : 0;
if (n / 10 == 0) return count; /* base case: no more digits */
return count + count_digit_fixed(n / 10, d);
}
/* Or handle the special case of n==0 separately: */
int count_digit_v2(int n, int d) {
if (n < 10) return (n == d) ? 1 : 0; /* single-digit base case */
return (n % 10 == d) + count_digit_v2(n / 10, d);
}
n == 0 fires before checking whether the digit 0 is in the number. When the user asks "how many 0s are in 0?", the function returns 0 without examining the number. The fix is to check the last digit before deciding to recurse, and use n / 10 == 0 (no more digits to examine) as the stopping condition instead.
Key concepts to memorize
Test your understanding
memcpy is safe to use when the source and destination memory regions overlap.LO1restrict keyword tell the compiler?LO1int countdown(int n) {
printf("%d\n", n);
return countdown(n - 1);
}
p and q in the same function both point to the same int variable. You write *p = 5;. What is the value of *q immediately after?LO1