Threads & Concurrency
POSIX threads (pthreads) — creating and joining threads with pthread_create and pthread_join, the shared-memory model, and an introduction to race conditions.
Why do we need threads? Moore's Law, the Power Wall, and Multicore
For decades, writing faster software was easy: wait 18 months and buy a new CPU. Gordon Moore observed in 1965 that the number of transistors on a chip doubles roughly every two years. Smaller transistors meant faster switching, which translated directly into higher clock frequencies — and your existing programs ran faster for free, without any code changes. This was the golden era of single-core computing (roughly 1970–2003).
Then around 2004, this stopped. The problem was physics: power is proportional to clock frequency and voltage squared (P ∝ V² · f). Running a chip at higher frequencies requires higher voltages, which dissipates exponentially more heat. The chip would literally melt. Intel hit this wall so hard it cancelled its Tejas and Jayhawk processor projects outright in 2004 — both were expected to exceed 100 W and were simply too hot to ship in consumer hardware.
The industry's solution was to put multiple slower cores on one chip. Double the cores, double the throughput — but only if your program is written to use them. A single-threaded program running on a 16-core CPU uses exactly one core; the other 15 sit idle. This is why parallel programming — and specifically threads — became a mandatory skill for systems programmers. COMP2017 Week 9 is where you learn to use those cores.
Timeline: from Moore's Law to Multicore
Key concepts
| Concept | What it means | Consequence |
|---|---|---|
| Moore's Law | Transistor count on a chip doubles roughly every 2 years (observed 1965, held until ~2015) | More transistors → smaller, faster circuits → higher clock frequency → free speedup for any program |
| Power Wall | P ∝ V² · f — power grows with clock frequency and voltage. Heat cannot be dissipated fast enough above ~4 GHz | Clock speeds stopped scaling around 2004. Running faster = melting the chip |
| Tejas & Jayhawk | Intel processor projects cancelled in 2004 because their projected power consumption exceeded safe thermal limits | Industry-wide pivot: abandon clock-speed scaling, pursue multicore instead |
| Multicore | Multiple CPU cores on one chip. Each core runs independently at a manageable clock speed and power budget | Total throughput scales, but only if the program is designed to run tasks in parallel |
| Why threads? | A thread is the unit of parallel execution within a process. Multiple threads can run simultaneously on different cores | Without threads, a program uses 1 core regardless of how many are available. All COMP2017 parallelism is thread-based |
Think of a single-core CPU as a single motorway lane. For decades, Moore's Law kept raising the speed limit — 100, 150, 200 km/h — and every car on the road automatically went faster. The Power Wall is the point where raising the speed limit any further would cause engines to catch fire. The solution: instead of one very fast lane, build 8 parallel lanes at a safe speed. Your car (program) must be redesigned to use multiple lanes (threads) to cross the country faster.
If you write single-threaded C code, you are only using 1 core. On a modern 8-core machine, you are leaving 87.5% of the CPU idle. On a 16-core server, you're using 6.25%. Threads let you exploit every core — but they require you to think about concurrent execution, shared data, and race conditions. That is exactly what the remainder of this topic (and Topic 27) covers.
Threads: workers sharing the same office
Imagine a single office with one large shared desk in the middle. Multiple workers (threads) all sit at this desk at the same time. They all read from the same filing cabinet (shared heap), write on the same whiteboard (global variables), and look at the same project plans (code). Each worker has their own notepad on their lap for personal scratch work (the stack), and each keeps track of where they are in their personal task list (program counter + registers). Because they share the same desk and filing cabinet, communication is trivially fast — any worker can reach over and read a note another worker left. But this also means two workers can accidentally overwrite each other's work at exactly the same moment. That disaster is a race condition.
In C, a thread is the smallest unit of execution that the operating system can schedule. Every program starts with exactly one thread — the main thread running main(). You can spawn additional threads using the POSIX pthreads API so that multiple sequences of instructions run concurrently within the same process.
Why threads? Modern CPUs have multiple cores. A single-threaded program uses only one core no matter how many you have. By splitting work across multiple threads, you can exploit all available cores and finish the job faster. This is the fundamental motivation for parallel programming covered in Week 9 of COMP2017.
There are two flavours of parallelism you need to know:
- Task parallelism — different tasks run simultaneously (e.g., computing the minimum and maximum of an array at the same time in separate threads).
- Data parallelism — the same task runs on different portions of data (e.g., four threads each summing one quarter of a large array).
Thread vs Process — the key differences
• Virtual address space (heap)
• Global / static variables
• Code (text segment)
• Open file descriptors
Private to each thread:
• Stack (local variables)
• Registers & program counter
• Thread ID (
pthread_t)Fast creation. Cheap communication.
• Entire virtual address space
• File descriptor table
• Heap, stack, globals
• PID (process ID)
Communication requires IPC:
• Pipes, sockets, shared memory
• Signals
Slow creation (
fork()). Expensive comms.
| Feature | Thread | Process |
|---|---|---|
| Memory space | Shared with all threads | Own private address space |
| Creation cost | Low (allocate stack + metadata) | High (copy entire address space) |
| Communication | Direct (shared variables) | IPC required (pipes, shmem) |
| Crash isolation | None — one crash kills all threads | Full — child crash doesn't kill parent |
| API (Unix) | pthread_create() | fork() |
| Wait for finish | pthread_join() | wait() / waitpid() |
| Header needed | <pthread.h> | <unistd.h> |
| Link flag | -lpthread | (none extra) |
When to use threads vs processes
Use threads when tasks need to communicate frequently or share large data structures. Example: a web server handling each HTTP request in a separate thread — all threads can access the same in-memory cache without copying data. Or: parallel computation on a shared array — four threads each working on one quarter of the array, combining results at the end.
Use processes when you need fault isolation (a crashed worker should not kill the main program), or when running entirely independent programs that communicate through pipes or sockets. Browsers run each tab as a separate process for exactly this reason.
Concurrency is making progress on multiple tasks — not necessarily at the same time. The OS scheduler rapidly switches threads so each gets a turn, even on a single-core machine.
Parallelism is multiple threads executing simultaneously on separate cores.
Concurrency = dealing with many things. Parallelism = doing many things at once. Both are useful; pthreads gives you both.
Once you create a thread, you cannot predict whether the main thread or the new thread will run first. The OS scheduler decides. Never write code that assumes a particular order of execution between threads — that assumption is a bug waiting to happen.
A process = a building. Threads = workers in that building. Workers share the building's resources (heap, globals, files). Each has their own desk notes (stack). Hiring a new worker (creating a thread) is cheap. Building a new building (forking a process) is expensive but gives you strong isolation.
Types of parallelism and task dependency graphs
Before writing a single line of pthread code, you need to answer two questions: what kind of parallelism does my problem have? and which tasks are allowed to run simultaneously? The lecture answers these with two frameworks: the task/data parallelism distinction and task dependency graphs (DAGs).
Task Parallelism vs Data Parallelism
Task parallelism: one cook makes pasta, another grills meat, a third makes dessert — different tasks happen simultaneously on potentially different ingredients.
Data parallelism: five workers each process one-fifth of a pile of vegetables doing the exact same chopping operation — the same task applied to different chunks of data simultaneously.
/* Task parallelism — threads do DIFFERENT things */ /* Thread A sorts the left half of an array */ /* Thread B finds the minimum in the right half */ /* → The operation varies per thread */ /* Data parallelism — threads do the SAME thing on different data */ /* 8 threads each sum 1/8 of an array (parallel reduction) */ /* → Same function, different data slice */
| Type | What varies | Example | Pattern |
|---|---|---|---|
| Task parallelism | The operation | Fork: child runs exec, parent waits — each does something different | Each thread has its own role / function |
| Data parallelism | The data subset | Parallel array sum — 4 threads each sum 1/4 of the array | All threads run the same function on their own slice |
Task Dependency Graphs (DAGs)
A Task Dependency Graph (DAG = Directed Acyclic Graph) shows which tasks must finish before others can start. This determines two critical quantities:
- What can run in parallel — tasks with no dependency between them.
- The critical path: the longest sequential chain of dependencies — this is the theoretical minimum time your program can take, regardless of how many threads you add.
/* Example dependency graph: */ /* */ /* [Parse file] */ /* / \ */ /* [Process A] [Process B] ← no dependency: run in parallel */ /* \ / */ /* [Merge results] */ /* | */ /* [Write output] */ /* */ /* Critical path = Parse + max(A,B) + Merge + Write */ /* Max parallelism at step 2 = 2 threads (limited by independent tasks) */
In pthreads, tasks with no dependency can be launched with pthread_create and run simultaneously. Tasks that depend on a result must pthread_join the predecessor thread before starting.
/* Parallelizing two independent tasks from the graph above */ pthread_t t_a, t_b; pthread_create(&t_a, NULL, process_a, data); /* start A */ pthread_create(&t_b, NULL, process_b, data); /* start B simultaneously */ pthread_join(t_a, NULL); /* wait for A to finish */ pthread_join(t_b, NULL); /* wait for B to finish */ /* Now both A and B are done — safe to call merge_results() */ merge_results();
pthread_join call in your code. Nodes with no incoming edges from an unfinished task can be launched immediately.
1. Draw the graph — boxes are tasks, arrows mean "must finish before".
2. Find all nodes with no predecessors — these can start immediately (create threads for all of them at once).
3. Trace the longest path (critical path) — that is your theoretical speedup ceiling.
4. Every join in your pthread code corresponds to an edge in the DAG.
pthread_create, pthread_join, and the thread function signature
The entire pthreads API lives in <pthread.h>. You must also link with -lpthread at compile time: gcc myfile.c -lpthread.
Thread function signature — the mandatory shape
/* Every thread function MUST have this exact signature */ void* my_thread_func(void* arg) { ^ ^ ^ | | └── single argument: always void* (cast to your type inside) | └─────────────────── return type: always void* (cast your return value, or NULL) └────────────────────────── your function name (any valid identifier) /* ... your thread's work here ... */ return NULL; /* return NULL when done (or cast a value to void*) */ }
void* (a generic pointer), you can cast any pointer type in. Inside the function, cast back to your actual type: int *p = (int*)arg;
pthread_create — spawning a thread
/* Signature from <pthread.h> */ int pthread_create( pthread_t *tid, /* arg 1 */ const pthread_attr_t *attr, /* arg 2 */ void* (*start_routine)(void*), /* arg 3 */ void* arg /* arg 4 */ ); ^ └── returns 0 on success, non-zero error code on failure /* Argument breakdown: */ /* arg 1 — tid: address of a pthread_t variable to store the new thread's ID */ /* → You pass &mythread so pthreads can write the ID back to you */ /* arg 2 — attr: thread attributes (stack size, detach state, etc.) */ /* → Pass NULL to use sensible defaults (joinable, default stack size) */ /* arg 3 — start_routine: function pointer — the work the thread will do */ /* → Must match signature: void* fn(void*) */ /* arg 4 — arg: a single void* argument passed to start_routine at runtime */ /* → Pass NULL if you have no argument; cast your pointer to (void*) */ /* Example usage: */ pthread_t tid; int rc = pthread_create(&tid, NULL, my_thread_func, NULL); if (rc != 0) { perror("pthread_create"); exit(1); }
pthread_create takes a pointer to a pthread_t (&tid), but pthread_join takes the value of the pthread_t (tid, not &tid). This asymmetry is a very common source of bugs.
pthread_join — waiting for a thread to finish
/* Signature from <pthread.h> */ int pthread_join( pthread_t tid, /* arg 1: thread ID to wait for (the VALUE, not a pointer) */ void** status /* arg 2: where to store the thread's return value, or NULL */ ); ^ └── returns 0 on success /* pthread_join BLOCKS until the named thread terminates. */ /* Once joined, the thread no longer exists. Do not join twice. */ /* Example: wait for tid and ignore return value */ pthread_join(tid, NULL); /* Example: capture the return value from the thread */ void* retval; pthread_join(tid, &retval); /* retval now holds whatever the thread returned with return or pthread_exit */
pthread_join is the thread equivalent of waitpid() for processes. It reclaims the resources used by the finished thread and gives you its exit status. If you never join a thread, you have a resource leak (similar to a zombie process).
pthread_exit — terminating a thread explicitly
/* Three ways a thread can terminate: */ /* 1. Return from the thread function (preferred) */ void* worker(void* arg) { /* ... work ... */ return NULL; /* thread ends cleanly */ } /* 2. Call pthread_exit() explicitly (useful deep in a call chain) */ pthread_exit(NULL); /* only THIS thread terminates, others continue */ /* 3. Be cancelled by another thread with pthread_cancel(tid) */ /* CRITICAL DISTINCTION: */ /* exit() kills the ENTIRE process (all threads die!) */ /* pthread_exit() kills ONLY the calling thread */ /* return in main() also kills all threads (like exit()) */ /* → if main() returns before threads finish, threads are killed */ /* → fix: call pthread_join() before returning from main() */
malloc-allocated memory or close file descriptors opened by that thread. The programmer is responsible for cleanup before termination.
Passing arguments to threads
/* SAFE: use a separate variable per thread (array of args) */ int args[4]; pthread_t tids[4]; for (int i = 0; i < 4; i++) { args[i] = i; /* unique value per thread */ pthread_create(&tids[i], NULL, worker, (void*)&args[i]); } /* UNSAFE (race condition!): passing &i directly */ /* pthread_create(&tids[i], NULL, worker, (void*)&i); // BUG! */ /* By the time the thread reads *arg, i has already been incremented! */ /* Inside the thread function — cast back to original type: */ void* worker(void* arg) { int my_id = *(int*)arg; /* cast void* back to int*, then dereference */ printf("I am thread %d\n", my_id); return NULL; }
void* slot. To pass multiple values, put them all in a struct and pass a pointer to that struct. Cast (void*)&my_struct when creating, and cast (struct my_args*)arg inside the thread.
The pthreads library is a separate shared library on Linux. You must tell the linker to include it:
gcc myprogram.c -lpthread -o myprogram
Without -lpthread, the linker cannot find pthread_create and will report "undefined reference" errors at link time. The -l flag means "link against library"; pthread is the library name (file: libpthread.so).
Joinable vs Detached threads
By default, threads are created in the joinable state — another thread must call pthread_join() to reclaim their resources when they finish. If you never join a joinable thread, you have a resource leak.
A detached thread cleans itself up automatically on exit. You cannot join a detached thread. Use detached threads when you want "fire and forget" — create the thread, let it run, and you never need its return value. Detach with: pthread_detach(tid) or by setting attributes before creation.
Complete programs you can compile and run
#include <stdio.h>
#include <pthread.h> /* POSIX threads API */
#define NUM_THREADS 3
/* Thread function — MUST match signature: void* fn(void*) */
void* say_hello(void* arg) {
int id = *(int*)arg; /* cast void* back to int*, dereference */
printf("Hello from thread %d!\n", id);
return NULL; /* threads return void* — NULL means "no return value" */
}
int main(void) {
pthread_t tids[NUM_THREADS]; /* array of thread IDs */
int args[NUM_THREADS]; /* one unique argument per thread — NEVER share &i directly */
/* Create 3 threads */
for (int i = 0; i < NUM_THREADS; i++) {
args[i] = i; /* store unique value in its own slot */
int rc = pthread_create(
&tids[i], /* &tid: pthreads writes the new thread's ID here */
NULL, /* attributes: NULL = use defaults (joinable) */
say_hello, /* start_routine: function pointer */
(void*)&args[i] /* arg: pointer to this thread's unique argument */
);
if (rc != 0) {
fprintf(stderr, "pthread_create failed: %d\n", rc);
return 1;
}
}
/* Join all threads — wait for each to finish before main() returns */
for (int i = 0; i < NUM_THREADS; i++) {
pthread_join(tids[i], NULL); /* tid VALUE (not &tids[i]) + ignore return */
}
printf("All threads finished.\n");
return 0;
}
/* Compile: gcc hello_threads.c -lpthread -o hello_threads */
Hello from thread 2!
Hello from thread 1!
All threads finished.
Hello from thread 0!
Hello from thread 2!
All threads finished.
#include <stdio.h>
#include <pthread.h>
#define ARRAY_SIZE 1000000
#define NUM_THREADS 4
int data[ARRAY_SIZE]; /* shared array — all threads can see this */
long partial_sums[NUM_THREADS]; /* each thread writes to its own slot */
/* Struct to pass multiple arguments to a thread */
struct thread_args {
int start; /* start index of this thread's segment */
int end; /* end index (exclusive) */
int tid; /* which thread am I? */
};
void* sum_segment(void* arg) {
struct thread_args* a = (struct thread_args*)arg; /* cast void* back */
long sum = 0;
for (int i = a->start; i < a->end; i++) {
sum += data[i];
}
partial_sums[a->tid] = sum; /* write result to our dedicated slot */
printf("Thread %d summed indices [%d, %d) = %ld\n",
a->tid, a->start, a->end, sum);
return NULL;
}
int main(void) {
/* Initialize array */
for (int i = 0; i < ARRAY_SIZE; i++) data[i] = 1; /* all 1s for easy checking */
pthread_t tids[NUM_THREADS];
struct thread_args args[NUM_THREADS]; /* one struct per thread */
int segment = ARRAY_SIZE / NUM_THREADS;
/* Create threads, each working on one quarter of the array */
for (int i = 0; i < NUM_THREADS; i++) {
args[i].start = i * segment;
args[i].end = (i == NUM_THREADS - 1) ? ARRAY_SIZE : (i + 1) * segment;
args[i].tid = i;
pthread_create(&tids[i], NULL, sum_segment, (void*)&args[i]);
}
/* Wait for all threads to finish */
for (int i = 0; i < NUM_THREADS; i++) {
pthread_join(tids[i], NULL);
}
/* Combine partial results in main thread */
long total = 0;
for (int i = 0; i < NUM_THREADS; i++) total += partial_sums[i];
printf("Total sum = %ld (expected %d)\n", total, ARRAY_SIZE);
return 0;
}
/* Compile: gcc sum_threads.c -lpthread -o sum_threads */
Thread 2 summed indices [500000, 750000) = 250000
Thread 1 summed indices [250000, 500000) = 250000
Thread 3 summed indices [750000, 1000000) = 250000
Total sum = 1000000 (expected 1000000)
#include <stdio.h>
#include <pthread.h>
#define NUM_THREADS 2
#define ITERATIONS 1000000
long counter = 0; /* shared global — both threads read and write this */
void* increment(void* arg) {
for (long i = 0; i < ITERATIONS; i++) {
counter = counter + 1; /* CRITICAL SECTION — not atomic! */
/* This compiles to roughly:
1) LOAD: register = counter (read from memory)
2) ADD: register = register + 1
3) STORE: counter = register (write back)
If two threads both load before either stores, one increment is LOST. */
}
return NULL;
}
int main(void) {
pthread_t tids[NUM_THREADS];
for (int i = 0; i < NUM_THREADS; i++)
pthread_create(&tids[i], NULL, increment, NULL);
for (int i = 0; i < NUM_THREADS; i++)
pthread_join(tids[i], NULL);
printf("Expected: %d\n", NUM_THREADS * ITERATIONS);
printf("Actual: %ld\n", counter);
printf("Lost increments due to race condition: %ld\n",
(long)NUM_THREADS * ITERATIONS - counter);
return 0;
}
/* Compile: gcc race.c -lpthread -o race */
Actual: 1042718
Lost increments due to race condition: 957282
Actual: 1186541
Lost increments due to race condition: 813459
Practice problems with solutions
Write a complete C program that creates a thread passing the integer value 42 as an argument. The thread should print "Thread received: 42" and return NULL. Use the correct thread function signature and pass the argument safely (not by casting the integer directly to void*).
#include <stdio.h>
#include <pthread.h>
void* print_value(void* arg) {
int val = *(int*)arg; /* cast void* to int*, then dereference */
printf("Thread received: %d\n", val);
return NULL; /* thread function always returns void* */
}
int main(void) {
pthread_t tid;
int data = 42; /* store argument in its own variable */
pthread_create(&tid, NULL, print_value, (void*)&data);
pthread_join(tid, NULL); /* wait for thread to finish */
return 0;
}
/* gcc p1.c -lpthread -o p1 */
void* and takes void* — no exceptions. We store the argument in a local variable (data) and pass &data cast to (void*). Inside the thread we cast back: (int*)arg gives us the address, then * dereferences it to get the integer value. We call pthread_join before main() returns, ensuring the thread has a chance to run and print before the process exits.
You need to pass two values to a thread: a start index and an end index. Since pthread_create only allows one void* argument, you must use a struct. Write the struct definition, the thread function, and the main() that creates one thread passing start=10, end=20. The thread should print all integers from start (inclusive) to end (exclusive).
#include <stdio.h>
#include <pthread.h>
/* Step 1: define a struct to hold all arguments */
struct range_args {
int start;
int end;
};
/* Step 2: thread function casts void* back to the struct pointer */
void* print_range(void* arg) {
struct range_args* r = (struct range_args*)arg;
for (int i = r->start; i < r->end; i++) {
printf("%d ", i);
}
printf("\n");
return NULL;
}
int main(void) {
pthread_t tid;
struct range_args args = { .start = 10, .end = 20 }; /* C99 designated init */
/* Step 3: pass pointer to struct, cast to (void*) */
pthread_create(&tid, NULL, print_range, (void*)&args);
pthread_join(tid, NULL);
return 0;
}
main(), pass (void*)&args to pthread_create. Inside the thread, cast back with (struct range_args*)arg and use -> to access fields. This is the correct way to pass any number of arguments to a thread. Make sure the struct outlives the thread — if you declare it inside a loop body that exits before the thread reads it, you have undefined behavior.
Write a program that creates 4 threads. Each thread i should compute i * i and return that value (cast to void*). After all threads finish, main() should collect the return values using pthread_join and print "Thread i returned: X" for each.
Hint: casting a small integer to void* and back is acceptable when you need to return a simple value: return (void*)(intptr_t)(i*i); and long result = (long)(intptr_t)retval;.
#include <stdio.h>
#include <stdint.h> /* for intptr_t */
#include <pthread.h>
#define N 4
int ids[N];
void* square(void* arg) {
int i = *(int*)arg;
long result = (long)i * i;
/* Cast integer result to void* via intptr_t to avoid pointer-size warnings */
return (void*)(intptr_t)result;
}
int main(void) {
pthread_t tids[N];
for (int i = 0; i < N; i++) {
ids[i] = i;
pthread_create(&tids[i], NULL, square, (void*)&ids[i]);
}
for (int i = 0; i < N; i++) {
void* retval;
pthread_join(tids[i], &retval); /* pass &retval to capture return */
long result = (long)(intptr_t)retval;
printf("Thread %d returned: %ld\n", i, result);
}
return 0;
}
pthread_join(tid, &retval) — the second argument is a void**. pthread_join writes the thread's return value into *retval. The thread returns a void*; to return a small integer we cast it through intptr_t (an integer type guaranteed large enough to hold a pointer). On the receiving side we cast back: (long)(intptr_t)retval. For returning large structures, allocate them on the heap and free after joining.
The code below is broken. It creates 3 threads all printing their index. Run mentally: what is actually likely to be printed? Identify the race condition and explain why it occurs. Then write the fixed version.
#include <stdio.h>
#include <pthread.h>
#define N 3
pthread_t tids[N];
void* worker(void* arg) {
int i = *(int*)arg; /* BUG IS HERE */
printf("I am thread %d\n", i);
return NULL;
}
int main(void) {
for (int i = 0; i < N; i++) {
pthread_create(&tids[i], NULL, worker, (void*)&i); /* passing &i */
}
for (int i = 0; i < N; i++) pthread_join(tids[i], NULL);
return 0;
}
/* THE BUG: all three threads receive &i — a pointer to the SAME variable.
The loop increments i before threads have a chance to read *arg.
All threads may print "3" (the final value of i after the loop).
Typical broken output:
I am thread 3
I am thread 3
I am thread 3
THE FIX: give each thread its own copy of the index. */
#include <stdio.h>
#include <pthread.h>
#define N 3
pthread_t tids[N];
int args[N]; /* separate variable for each thread */
void* worker(void* arg) {
int my_id = *(int*)arg; /* safe: this slot belongs only to me */
printf("I am thread %d\n", my_id);
return NULL;
}
int main(void) {
for (int i = 0; i < N; i++) {
args[i] = i; /* store i's value in a dedicated slot */
pthread_create(&tids[i], NULL, worker, (void*)&args[i]);
}
for (int i = 0; i < N; i++) pthread_join(tids[i], NULL);
return 0;
}
&i shares a single memory location between all threads. By the time thread 0 reads the value at that address, the main thread has already incremented i to 1, 2, or even 3. This is the classic first race condition taught in COMP2017. The shared data item is variable i; the final result depends on relative timing. Fix: store each value in a dedicated array slot before creating the thread.
Key concepts to memorize
Test your understanding
pthread_create receive as its first argument?
LO10
exit() from a thread terminates only that thread, leaving all other threads running.
LO10
-lm).
LO10
#define N 5
pthread_t tids[N];
void* worker(void* arg) {
int i = *(int*)arg;
printf("Thread %d\n", i);
return NULL;
}
int main(void) {
for (int i = 0; i < N; i++)
pthread_create(&tids[i], NULL, worker, (void*)&i);
for (int i = 0; i < N; i++)
pthread_join(tids[i], NULL);
return 0;
}