Why do we need threads? Moore's Law, the Power Wall, and Multicore

For decades, writing faster software was easy: wait 18 months and buy a new CPU. Gordon Moore observed in 1965 that the number of transistors on a chip doubles roughly every two years. Smaller transistors meant faster switching, which translated directly into higher clock frequencies — and your existing programs ran faster for free, without any code changes. This was the golden era of single-core computing (roughly 1970–2003).

Then around 2004, this stopped. The problem was physics: power is proportional to clock frequency and voltage squared (P ∝ V² · f). Running a chip at higher frequencies requires higher voltages, which dissipates exponentially more heat. The chip would literally melt. Intel hit this wall so hard it cancelled its Tejas and Jayhawk processor projects outright in 2004 — both were expected to exceed 100 W and were simply too hot to ship in consumer hardware.

The industry's solution was to put multiple slower cores on one chip. Double the cores, double the throughput — but only if your program is written to use them. A single-threaded program running on a 16-core CPU uses exactly one core; the other 15 sit idle. This is why parallel programming — and specifically threads — became a mandatory skill for systems programmers. COMP2017 Week 9 is where you learn to use those cores.

Timeline: from Moore's Law to Multicore

1965
Moore's Law
stated
1971–2003
Clock speed doubles
every ~18 months.
Programs get faster
for free.
2004
Power Wall
Intel cancels
Tejas & Jayhawk.
Chips run too hot.
2005+
Multicore era
Dual-core, quad-core,
many-core. Programs
must use threads.

Key concepts

Concept What it means Consequence
Moore's Law Transistor count on a chip doubles roughly every 2 years (observed 1965, held until ~2015) More transistors → smaller, faster circuits → higher clock frequency → free speedup for any program
Power Wall P ∝ V² · f — power grows with clock frequency and voltage. Heat cannot be dissipated fast enough above ~4 GHz Clock speeds stopped scaling around 2004. Running faster = melting the chip
Tejas & Jayhawk Intel processor projects cancelled in 2004 because their projected power consumption exceeded safe thermal limits Industry-wide pivot: abandon clock-speed scaling, pursue multicore instead
Multicore Multiple CPU cores on one chip. Each core runs independently at a manageable clock speed and power budget Total throughput scales, but only if the program is designed to run tasks in parallel
Why threads? A thread is the unit of parallel execution within a process. Multiple threads can run simultaneously on different cores Without threads, a program uses 1 core regardless of how many are available. All COMP2017 parallelism is thread-based
Analogy — The Speed Limit

Think of a single-core CPU as a single motorway lane. For decades, Moore's Law kept raising the speed limit — 100, 150, 200 km/h — and every car on the road automatically went faster. The Power Wall is the point where raising the speed limit any further would cause engines to catch fire. The solution: instead of one very fast lane, build 8 parallel lanes at a safe speed. Your car (program) must be redesigned to use multiple lanes (threads) to cross the country faster.

Why This Matters For Your Code Right Now

If you write single-threaded C code, you are only using 1 core. On a modern 8-core machine, you are leaving 87.5% of the CPU idle. On a 16-core server, you're using 6.25%. Threads let you exploit every core — but they require you to think about concurrent execution, shared data, and race conditions. That is exactly what the remainder of this topic (and Topic 27) covers.

Threads: workers sharing the same office

Real-World Analogy — The Open-Plan Office

Imagine a single office with one large shared desk in the middle. Multiple workers (threads) all sit at this desk at the same time. They all read from the same filing cabinet (shared heap), write on the same whiteboard (global variables), and look at the same project plans (code). Each worker has their own notepad on their lap for personal scratch work (the stack), and each keeps track of where they are in their personal task list (program counter + registers). Because they share the same desk and filing cabinet, communication is trivially fast — any worker can reach over and read a note another worker left. But this also means two workers can accidentally overwrite each other's work at exactly the same moment. That disaster is a race condition.

In C, a thread is the smallest unit of execution that the operating system can schedule. Every program starts with exactly one thread — the main thread running main(). You can spawn additional threads using the POSIX pthreads API so that multiple sequences of instructions run concurrently within the same process.

Why threads? Modern CPUs have multiple cores. A single-threaded program uses only one core no matter how many you have. By splitting work across multiple threads, you can exploit all available cores and finish the job faster. This is the fundamental motivation for parallel programming covered in Week 9 of COMP2017.

There are two flavours of parallelism you need to know:

  • Task parallelism — different tasks run simultaneously (e.g., computing the minimum and maximum of an array at the same time in separate threads).
  • Data parallelism — the same task runs on different portions of data (e.g., four threads each summing one quarter of a large array).

Thread vs Process — the key differences

Thread (lightweight)
Shares with siblings:
• Virtual address space (heap)
• Global / static variables
• Code (text segment)
• Open file descriptors

Private to each thread:
• Stack (local variables)
• Registers & program counter
• Thread ID (pthread_t)

Fast creation. Cheap communication.
Process (heavyweight)
Each process has its own:
• Entire virtual address space
• File descriptor table
• Heap, stack, globals
• PID (process ID)

Communication requires IPC:
• Pipes, sockets, shared memory
• Signals

Slow creation (fork()). Expensive comms.
Feature Thread Process
Memory space Shared with all threads Own private address space
Creation cost Low (allocate stack + metadata) High (copy entire address space)
Communication Direct (shared variables) IPC required (pipes, shmem)
Crash isolation None — one crash kills all threads Full — child crash doesn't kill parent
API (Unix) pthread_create() fork()
Wait for finish pthread_join() wait() / waitpid()
Header needed <pthread.h> <unistd.h>
Link flag -lpthread (none extra)

When to use threads vs processes

Use threads when tasks need to communicate frequently or share large data structures. Example: a web server handling each HTTP request in a separate thread — all threads can access the same in-memory cache without copying data. Or: parallel computation on a shared array — four threads each working on one quarter of the array, combining results at the end.

Use processes when you need fault isolation (a crashed worker should not kill the main program), or when running entirely independent programs that communicate through pipes or sockets. Browsers run each tab as a separate process for exactly this reason.

Concurrency vs Parallelism

Concurrency is making progress on multiple tasks — not necessarily at the same time. The OS scheduler rapidly switches threads so each gets a turn, even on a single-core machine.
Parallelism is multiple threads executing simultaneously on separate cores.
Concurrency = dealing with many things. Parallelism = doing many things at once. Both are useful; pthreads gives you both.

Execution Order is Non-Deterministic

Once you create a thread, you cannot predict whether the main thread or the new thread will run first. The OS scheduler decides. Never write code that assumes a particular order of execution between threads — that assumption is a bug waiting to happen.

Mental Model Check

A process = a building. Threads = workers in that building. Workers share the building's resources (heap, globals, files). Each has their own desk notes (stack). Hiring a new worker (creating a thread) is cheap. Building a new building (forking a process) is expensive but gives you strong isolation.

Types of parallelism and task dependency graphs

Before writing a single line of pthread code, you need to answer two questions: what kind of parallelism does my problem have? and which tasks are allowed to run simultaneously? The lecture answers these with two frameworks: the task/data parallelism distinction and task dependency graphs (DAGs).

Task Parallelism vs Data Parallelism

Real-World Analogy — The Restaurant Kitchen

Task parallelism: one cook makes pasta, another grills meat, a third makes dessert — different tasks happen simultaneously on potentially different ingredients.

Data parallelism: five workers each process one-fifth of a pile of vegetables doing the exact same chopping operation — the same task applied to different chunks of data simultaneously.

/* Task parallelism — threads do DIFFERENT things */
/* Thread A sorts the left half of an array         */
/* Thread B finds the minimum in the right half     */
/* → The operation varies per thread                */

/* Data parallelism — threads do the SAME thing on different data */
/* 8 threads each sum 1/8 of an array (parallel reduction)       */
/* → Same function, different data slice                          */
Note: Real programs often mix both. A pipeline stage may split data across threads (data parallelism) while different pipeline stages run concurrently (task parallelism).
Type What varies Example Pattern
Task parallelism The operation Fork: child runs exec, parent waits — each does something different Each thread has its own role / function
Data parallelism The data subset Parallel array sum — 4 threads each sum 1/4 of the array All threads run the same function on their own slice

Task Dependency Graphs (DAGs)

A Task Dependency Graph (DAG = Directed Acyclic Graph) shows which tasks must finish before others can start. This determines two critical quantities:

  • What can run in parallel — tasks with no dependency between them.
  • The critical path: the longest sequential chain of dependencies — this is the theoretical minimum time your program can take, regardless of how many threads you add.
/*  Example dependency graph:                                         */
/*                                                                    */
/*          [Parse file]                                              */
/*          /           \                                             */
/*   [Process A]    [Process B]    ← no dependency: run in parallel  */
/*          \           /                                             */
/*          [Merge results]                                           */
/*          |                                                         */
/*        [Write output]                                              */
/*                                                                    */
/* Critical path = Parse + max(A,B) + Merge + Write                  */
/* Max parallelism at step 2 = 2 threads (limited by independent tasks) */
Key insight: Amdahl's Law — adding more threads beyond the available parallelism gives no speedup. The critical path sets a hard floor on execution time.

In pthreads, tasks with no dependency can be launched with pthread_create and run simultaneously. Tasks that depend on a result must pthread_join the predecessor thread before starting.

/* Parallelizing two independent tasks from the graph above */
pthread_t t_a, t_b;

pthread_create(&t_a, NULL, process_a, data);  /* start A */
pthread_create(&t_b, NULL, process_b, data);  /* start B simultaneously */

pthread_join(t_a, NULL);  /* wait for A to finish */
pthread_join(t_b, NULL);  /* wait for B to finish */
/* Now both A and B are done — safe to call merge_results() */
merge_results();
DAG rule: Every edge in the DAG (A must complete before C) maps to a pthread_join call in your code. Nodes with no incoming edges from an unfinished task can be launched immediately.
How to Read a DAG Before You Code

1. Draw the graph — boxes are tasks, arrows mean "must finish before".
2. Find all nodes with no predecessors — these can start immediately (create threads for all of them at once).
3. Trace the longest path (critical path) — that is your theoretical speedup ceiling.
4. Every join in your pthread code corresponds to an edge in the DAG.

pthread_create, pthread_join, and the thread function signature

The entire pthreads API lives in <pthread.h>. You must also link with -lpthread at compile time: gcc myfile.c -lpthread.

Thread function signature — the mandatory shape

/* Every thread function MUST have this exact signature */
void* my_thread_func(void* arg) {
 ^      ^                 ^
 |      |                 └── single argument: always void* (cast to your type inside)
 |      └─────────────────── return type: always void* (cast your return value, or NULL)
 └────────────────────────── your function name (any valid identifier)
    /* ... your thread's work here ... */
    return NULL;  /* return NULL when done (or cast a value to void*) */
}
Why void*? The pthreads API is type-agnostic — it does not know what type of argument your thread needs. By using void* (a generic pointer), you can cast any pointer type in. Inside the function, cast back to your actual type: int *p = (int*)arg;

pthread_create — spawning a thread

/* Signature from <pthread.h> */
int pthread_create(
    pthread_t  *tid,                     /* arg 1 */
    const pthread_attr_t *attr,           /* arg 2 */
    void* (*start_routine)(void*),      /* arg 3 */
    void* arg                             /* arg 4 */
);
 ^
 └── returns 0 on success, non-zero error code on failure

/* Argument breakdown: */
/* arg 1 — tid: address of a pthread_t variable to store the new thread's ID */
/*   → You pass &mythread so pthreads can write the ID back to you          */
/* arg 2 — attr: thread attributes (stack size, detach state, etc.)          */
/*   → Pass NULL to use sensible defaults (joinable, default stack size)     */
/* arg 3 — start_routine: function pointer — the work the thread will do     */
/*   → Must match signature: void* fn(void*)                                 */
/* arg 4 — arg: a single void* argument passed to start_routine at runtime   */
/*   → Pass NULL if you have no argument; cast your pointer to (void*)       */

/* Example usage: */
pthread_t tid;
int rc = pthread_create(&tid, NULL, my_thread_func, NULL);
if (rc != 0) { perror("pthread_create"); exit(1); }
Key fact: Note that pthread_create takes a pointer to a pthread_t (&tid), but pthread_join takes the value of the pthread_t (tid, not &tid). This asymmetry is a very common source of bugs.

pthread_join — waiting for a thread to finish

/* Signature from <pthread.h> */
int pthread_join(
    pthread_t   tid,     /* arg 1: thread ID to wait for (the VALUE, not a pointer) */
    void**      status   /* arg 2: where to store the thread's return value, or NULL */
);
 ^
 └── returns 0 on success

/* pthread_join BLOCKS until the named thread terminates.            */
/* Once joined, the thread no longer exists. Do not join twice.      */

/* Example: wait for tid and ignore return value */
pthread_join(tid, NULL);

/* Example: capture the return value from the thread */
void* retval;
pthread_join(tid, &retval);
/* retval now holds whatever the thread returned with return or pthread_exit */
Analogy: pthread_join is the thread equivalent of waitpid() for processes. It reclaims the resources used by the finished thread and gives you its exit status. If you never join a thread, you have a resource leak (similar to a zombie process).

pthread_exit — terminating a thread explicitly

/* Three ways a thread can terminate: */

/* 1. Return from the thread function (preferred) */
void* worker(void* arg) {
    /* ... work ... */
    return NULL;          /* thread ends cleanly */
}

/* 2. Call pthread_exit() explicitly (useful deep in a call chain) */
pthread_exit(NULL);   /* only THIS thread terminates, others continue */

/* 3. Be cancelled by another thread with pthread_cancel(tid) */

/* CRITICAL DISTINCTION: */
/* exit()           kills the ENTIRE process (all threads die!) */
/* pthread_exit()   kills ONLY the calling thread               */
/* return in main() also kills all threads (like exit())        */
/* → if main() returns before threads finish, threads are killed */
/* → fix: call pthread_join() before returning from main()      */
Important gotcha: Terminating a thread does not automatically free malloc-allocated memory or close file descriptors opened by that thread. The programmer is responsible for cleanup before termination.

Passing arguments to threads

/* SAFE: use a separate variable per thread (array of args) */
int args[4];
pthread_t tids[4];
for (int i = 0; i < 4; i++) {
    args[i] = i;                                        /* unique value per thread */
    pthread_create(&tids[i], NULL, worker, (void*)&args[i]);
}

/* UNSAFE (race condition!): passing &i directly */
/* pthread_create(&tids[i], NULL, worker, (void*)&i); // BUG! */
/* By the time the thread reads *arg, i has already been incremented! */

/* Inside the thread function — cast back to original type: */
void* worker(void* arg) {
    int my_id = *(int*)arg;   /* cast void* back to int*, then dereference */
    printf("I am thread %d\n", my_id);
    return NULL;
}
Passing multiple arguments: There is only one void* slot. To pass multiple values, put them all in a struct and pass a pointer to that struct. Cast (void*)&my_struct when creating, and cast (struct my_args*)arg inside the thread.
Compile with -lpthread — not optional

The pthreads library is a separate shared library on Linux. You must tell the linker to include it:
gcc myprogram.c -lpthread -o myprogram
Without -lpthread, the linker cannot find pthread_create and will report "undefined reference" errors at link time. The -l flag means "link against library"; pthread is the library name (file: libpthread.so).

Joinable vs Detached threads

By default, threads are created in the joinable state — another thread must call pthread_join() to reclaim their resources when they finish. If you never join a joinable thread, you have a resource leak.

A detached thread cleans itself up automatically on exit. You cannot join a detached thread. Use detached threads when you want "fire and forget" — create the thread, let it run, and you never need its return value. Detach with: pthread_detach(tid) or by setting attributes before creation.

Complete programs you can compile and run

Example 1 — Hello World with 3 threads Week 9 Lecture
#include <stdio.h>
#include <pthread.h>     /* POSIX threads API */

#define NUM_THREADS 3

/* Thread function — MUST match signature: void* fn(void*) */
void* say_hello(void* arg) {
    int id = *(int*)arg;             /* cast void* back to int*, dereference */
    printf("Hello from thread %d!\n", id);
    return NULL;                     /* threads return void* — NULL means "no return value" */
}

int main(void) {
    pthread_t tids[NUM_THREADS];    /* array of thread IDs */
    int args[NUM_THREADS];          /* one unique argument per thread — NEVER share &i directly */

    /* Create 3 threads */
    for (int i = 0; i < NUM_THREADS; i++) {
        args[i] = i;                 /* store unique value in its own slot */
        int rc = pthread_create(
            &tids[i],               /* &tid: pthreads writes the new thread's ID here */
            NULL,                   /* attributes: NULL = use defaults (joinable) */
            say_hello,              /* start_routine: function pointer */
            (void*)&args[i]        /* arg: pointer to this thread's unique argument */
        );
        if (rc != 0) {
            fprintf(stderr, "pthread_create failed: %d\n", rc);
            return 1;
        }
    }

    /* Join all threads — wait for each to finish before main() returns */
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(tids[i], NULL);  /* tid VALUE (not &tids[i]) + ignore return */
    }

    printf("All threads finished.\n");
    return 0;
}
/* Compile: gcc hello_threads.c -lpthread -o hello_threads */
Possible output (run 1)
Hello from thread 0!
Hello from thread 2!
Hello from thread 1!
All threads finished.
Possible output (run 2 — different order!)
Hello from thread 1!
Hello from thread 0!
Hello from thread 2!
All threads finished.
WARNING: The order of thread output is NON-DETERMINISTIC. The OS scheduler decides who runs first. Never rely on a specific ordering between threads.
Example 2 — Data parallelism: threads summing array segments Week 9 Lecture — Data Parallelism
#include <stdio.h>
#include <pthread.h>

#define ARRAY_SIZE  1000000
#define NUM_THREADS 4

int data[ARRAY_SIZE];   /* shared array — all threads can see this */
long partial_sums[NUM_THREADS];   /* each thread writes to its own slot */

/* Struct to pass multiple arguments to a thread */
struct thread_args {
    int start;   /* start index of this thread's segment */
    int end;     /* end index (exclusive) */
    int tid;     /* which thread am I? */
};

void* sum_segment(void* arg) {
    struct thread_args* a = (struct thread_args*)arg;   /* cast void* back */
    long sum = 0;
    for (int i = a->start; i < a->end; i++) {
        sum += data[i];
    }
    partial_sums[a->tid] = sum;   /* write result to our dedicated slot */
    printf("Thread %d summed indices [%d, %d) = %ld\n",
           a->tid, a->start, a->end, sum);
    return NULL;
}

int main(void) {
    /* Initialize array */
    for (int i = 0; i < ARRAY_SIZE; i++) data[i] = 1;  /* all 1s for easy checking */

    pthread_t tids[NUM_THREADS];
    struct thread_args args[NUM_THREADS];   /* one struct per thread */
    int segment = ARRAY_SIZE / NUM_THREADS;

    /* Create threads, each working on one quarter of the array */
    for (int i = 0; i < NUM_THREADS; i++) {
        args[i].start = i * segment;
        args[i].end   = (i == NUM_THREADS - 1) ? ARRAY_SIZE : (i + 1) * segment;
        args[i].tid   = i;
        pthread_create(&tids[i], NULL, sum_segment, (void*)&args[i]);
    }

    /* Wait for all threads to finish */
    for (int i = 0; i < NUM_THREADS; i++) {
        pthread_join(tids[i], NULL);
    }

    /* Combine partial results in main thread */
    long total = 0;
    for (int i = 0; i < NUM_THREADS; i++) total += partial_sums[i];
    printf("Total sum = %ld (expected %d)\n", total, ARRAY_SIZE);
    return 0;
}
/* Compile: gcc sum_threads.c -lpthread -o sum_threads */
Sample output
Thread 0 summed indices [0, 250000) = 250000
Thread 2 summed indices [500000, 750000) = 250000
Thread 1 summed indices [250000, 500000) = 250000
Thread 3 summed indices [750000, 1000000) = 250000
Total sum = 1000000 (expected 1000000)
Thread print order may vary — only the final total is guaranteed correct (because we join before reading).
Example 3 — Race condition demonstration (shared counter) Week 10 Lecture — Race Conditions
#include <stdio.h>
#include <pthread.h>

#define NUM_THREADS  2
#define ITERATIONS   1000000

long counter = 0;   /* shared global — both threads read and write this */

void* increment(void* arg) {
    for (long i = 0; i < ITERATIONS; i++) {
        counter = counter + 1;   /* CRITICAL SECTION — not atomic! */
        /* This compiles to roughly:
           1) LOAD:  register = counter      (read from memory)
           2) ADD:   register = register + 1
           3) STORE: counter = register      (write back)
           If two threads both load before either stores, one increment is LOST. */
    }
    return NULL;
}

int main(void) {
    pthread_t tids[NUM_THREADS];

    for (int i = 0; i < NUM_THREADS; i++)
        pthread_create(&tids[i], NULL, increment, NULL);
    for (int i = 0; i < NUM_THREADS; i++)
        pthread_join(tids[i], NULL);

    printf("Expected: %d\n", NUM_THREADS * ITERATIONS);
    printf("Actual:   %ld\n", counter);
    printf("Lost increments due to race condition: %ld\n",
           (long)NUM_THREADS * ITERATIONS - counter);
    return 0;
}
/* Compile: gcc race.c -lpthread -o race */
Typical output (result is non-deterministic!)
Expected: 2000000
Actual: 1042718
Lost increments due to race condition: 957282
Another run — different (wrong) result
Expected: 2000000
Actual: 1186541
Lost increments due to race condition: 813459
RACE CONDITION: counter++ is three machine instructions (load-add-store). When two threads interleave their execution, one thread's store can overwrite the other's, losing increments. The fix (Topic 27) is a mutex to protect the critical section.

Practice problems with solutions

P1 — Write a thread function that prints its argument Week 9 Lecture

Write a complete C program that creates a thread passing the integer value 42 as an argument. The thread should print "Thread received: 42" and return NULL. Use the correct thread function signature and pass the argument safely (not by casting the integer directly to void*).

#include <stdio.h>
#include <pthread.h>

void* print_value(void* arg) {
    int val = *(int*)arg;          /* cast void* to int*, then dereference */
    printf("Thread received: %d\n", val);
    return NULL;                   /* thread function always returns void* */
}

int main(void) {
    pthread_t tid;
    int data = 42;                 /* store argument in its own variable */

    pthread_create(&tid, NULL, print_value, (void*)&data);
    pthread_join(tid, NULL);       /* wait for thread to finish */
    return 0;
}
/* gcc p1.c -lpthread -o p1 */
Key points: The thread function returns void* and takes void* — no exceptions. We store the argument in a local variable (data) and pass &data cast to (void*). Inside the thread we cast back: (int*)arg gives us the address, then * dereferences it to get the integer value. We call pthread_join before main() returns, ensuring the thread has a chance to run and print before the process exits.
P2 — Pass a struct to a thread function Week 9 Lecture — Multiple Arguments

You need to pass two values to a thread: a start index and an end index. Since pthread_create only allows one void* argument, you must use a struct. Write the struct definition, the thread function, and the main() that creates one thread passing start=10, end=20. The thread should print all integers from start (inclusive) to end (exclusive).

#include <stdio.h>
#include <pthread.h>

/* Step 1: define a struct to hold all arguments */
struct range_args {
    int start;
    int end;
};

/* Step 2: thread function casts void* back to the struct pointer */
void* print_range(void* arg) {
    struct range_args* r = (struct range_args*)arg;
    for (int i = r->start; i < r->end; i++) {
        printf("%d ", i);
    }
    printf("\n");
    return NULL;
}

int main(void) {
    pthread_t tid;
    struct range_args args = { .start = 10, .end = 20 };  /* C99 designated init */

    /* Step 3: pass pointer to struct, cast to (void*) */
    pthread_create(&tid, NULL, print_range, (void*)&args);
    pthread_join(tid, NULL);
    return 0;
}
The pattern: define a struct, fill it in main(), pass (void*)&args to pthread_create. Inside the thread, cast back with (struct range_args*)arg and use -> to access fields. This is the correct way to pass any number of arguments to a thread. Make sure the struct outlives the thread — if you declare it inside a loop body that exits before the thread reads it, you have undefined behavior.
P3 — Join multiple threads and collect return values Week 9 Lecture

Write a program that creates 4 threads. Each thread i should compute i * i and return that value (cast to void*). After all threads finish, main() should collect the return values using pthread_join and print "Thread i returned: X" for each.

Hint: casting a small integer to void* and back is acceptable when you need to return a simple value: return (void*)(intptr_t)(i*i); and long result = (long)(intptr_t)retval;.

#include <stdio.h>
#include <stdint.h>   /* for intptr_t */
#include <pthread.h>

#define N 4

int ids[N];

void* square(void* arg) {
    int i = *(int*)arg;
    long result = (long)i * i;
    /* Cast integer result to void* via intptr_t to avoid pointer-size warnings */
    return (void*)(intptr_t)result;
}

int main(void) {
    pthread_t tids[N];

    for (int i = 0; i < N; i++) {
        ids[i] = i;
        pthread_create(&tids[i], NULL, square, (void*)&ids[i]);
    }

    for (int i = 0; i < N; i++) {
        void* retval;
        pthread_join(tids[i], &retval);    /* pass &retval to capture return */
        long result = (long)(intptr_t)retval;
        printf("Thread %d returned: %ld\n", i, result);
    }
    return 0;
}
How return values work: pthread_join(tid, &retval) — the second argument is a void**. pthread_join writes the thread's return value into *retval. The thread returns a void*; to return a small integer we cast it through intptr_t (an integer type guaranteed large enough to hold a pointer). On the receiving side we cast back: (long)(intptr_t)retval. For returning large structures, allocate them on the heap and free after joining.
P4 — Identify the race condition Week 9-10 Lecture — Race Conditions

The code below is broken. It creates 3 threads all printing their index. Run mentally: what is actually likely to be printed? Identify the race condition and explain why it occurs. Then write the fixed version.

#include <stdio.h>
#include <pthread.h>
#define N 3
pthread_t tids[N];

void* worker(void* arg) {
    int i = *(int*)arg;          /* BUG IS HERE */
    printf("I am thread %d\n", i);
    return NULL;
}

int main(void) {
    for (int i = 0; i < N; i++) {
        pthread_create(&tids[i], NULL, worker, (void*)&i);  /* passing &i */
    }
    for (int i = 0; i < N; i++) pthread_join(tids[i], NULL);
    return 0;
}
/* THE BUG: all three threads receive &i — a pointer to the SAME variable.
   The loop increments i before threads have a chance to read *arg.
   All threads may print "3" (the final value of i after the loop).

   Typical broken output:
     I am thread 3
     I am thread 3
     I am thread 3

   THE FIX: give each thread its own copy of the index. */

#include <stdio.h>
#include <pthread.h>
#define N 3
pthread_t tids[N];
int args[N];           /* separate variable for each thread */

void* worker(void* arg) {
    int my_id = *(int*)arg;   /* safe: this slot belongs only to me */
    printf("I am thread %d\n", my_id);
    return NULL;
}

int main(void) {
    for (int i = 0; i < N; i++) {
        args[i] = i;           /* store i's value in a dedicated slot */
        pthread_create(&tids[i], NULL, worker, (void*)&args[i]);
    }
    for (int i = 0; i < N; i++) pthread_join(tids[i], NULL);
    return 0;
}
Root cause: Passing &i shares a single memory location between all threads. By the time thread 0 reads the value at that address, the main thread has already incremented i to 1, 2, or even 3. This is the classic first race condition taught in COMP2017. The shared data item is variable i; the final result depends on relative timing. Fix: store each value in a dedicated array slot before creating the thread.

Key concepts to memorize

Card 1 of 11
Question — click to flip
Answer
Click card to flip • Use buttons to navigate

Test your understanding

Topic 26 Quiz — Threads & Concurrency Score: 0 / 7
1
What does pthread_create receive as its first argument? LO10
multiple choice
2
True or False: Calling exit() from a thread terminates only that thread, leaving all other threads running. LO10
true / false
3
Which of the following is shared between all threads in the same process? LO10
multiple choice
4
Fill in the blank: to compile a program that uses pthreads, you must add the linker flag _____ to your gcc command (answer: just the flag, e.g. -lm). LO10
fill in the blank
5
Spot the bug: what is wrong with this code? LO10
#define N 5
pthread_t tids[N];
void* worker(void* arg) {
    int i = *(int*)arg;
    printf("Thread %d\n", i);
    return NULL;
}
int main(void) {
    for (int i = 0; i < N; i++)
        pthread_create(&tids[i], NULL, worker, (void*)&i);
    for (int i = 0; i < N; i++)
        pthread_join(tids[i], NULL);
    return 0;
}
spot the bug — multiple choice
6
True or False: A race condition occurs when the final result of a multi-threaded program depends on the relative timing of thread execution. LO10
true / false
7
In a task dependency graph (DAG), what does the critical path represent? LO10
multiple choice
0/7
Quiz complete!