Binary Search Trees
A recursive data structure where every node's left subtree holds smaller values and right subtree holds larger values — enabling O(log n) search, insert, and delete on sorted data.
What is a Binary Search Tree?
Imagine a library card catalogue where every drawer has a label. If you're looking for a book starting with "M", you open the middle drawer. Everything left of it is A–L, everything right is N–Z. You've halved your search space instantly. A BST works exactly the same way — each node is a decision point that eliminates half the remaining possibilities. This is why search is O(log n) on a balanced tree instead of O(n) for a linked list.
A Binary Search Tree is a linked data structure where each node holds a value and two pointers: left (to a smaller subtree) and right (to a larger subtree). The defining rule — the BST property — is:
For every node N:
• All values in N's left subtree are strictly less than N's value.
• All values in N's right subtree are strictly greater than N's value.
This property must hold for every node in the tree, not just the root. It's what enables efficient search: at each node you make one comparison and discard half the tree.
BST vs Linked List: A linked list requires O(n) search — you must scan every element. A balanced BST requires O(log n) — you halve the search space at each step. For 1,000,000 elements, a linked list might need 1,000,000 comparisons; a balanced BST needs at most 20.
Why "unbalanced" matters: If you insert already-sorted data (e.g., 1, 2, 3, 4, 5) into a naive BST, it degenerates into a linked list — every node goes to the right child, giving O(n) worst case. Self-balancing trees (AVL, Red-Black) fix this, but COMP2017 focuses on the unbalanced version.
BST vs Other Structures
| Operation | Array (sorted) | Linked List | BST (balanced) | BST (degenerate) |
|---|---|---|---|---|
| Search | O(log n) binary search | O(n) | O(log n) | O(n) |
| Insert | O(n) shift elements | O(n) find position | O(log n) | O(n) |
| Delete | O(n) shift elements | O(n) | O(log n) | O(n) |
| In-order traversal | O(n) | O(n) unsorted | O(n) gives sorted output | O(n) gives sorted output |
In-order traversal of a BST always produces values in sorted ascending order. This is a direct consequence of the BST property and is one of the most useful properties to remember for exams.
The standard BST definition uses strictly less/greater — duplicates are typically disallowed, or handled by a consistent rule (e.g., duplicates always go left). COMP2017 exercises usually assume all values are unique.
Node struct, insert, search, traversals
The BST Node Struct
/* BST node — holds a value and two child pointers */ struct bst_node { int value; /* the data stored at this node */ struct bst_node *left; /* pointer to left child (smaller values) */ struct bst_node *right; /* pointer to right child (larger values) */ }; /* Common shorthand with typedef */ typedef struct bst_node Node; /* Create a new node on the heap */ Node *create_node(int val) { Node *n = malloc(sizeof(Node)); n->value = val; n->left = NULL; n->right = NULL; return n; }
left and right are initialised to NULL — NULL is how you represent "no child here" (a leaf or empty slot).
Insert — Recursive Approach
/* Insert val into BST rooted at root. Returns new root. */ Node *bst_insert(Node *root, int val) { /* Base case: empty spot found — create a node here */ if (root == NULL) return create_node(val); /* Recursive case: decide left or right */ if (val < root->value) root->left = bst_insert(root->left, val); else if (val > root->value) root->right = bst_insert(root->right, val); /* val == root->value: duplicate, ignore */ return root; /* return (possibly unchanged) root */ }
root->left = bst_insert(...).
The Three Traversals
/* IN-ORDER: left → root → right → gives SORTED output */ void inorder(Node *root) { if (root == NULL) return; inorder(root->left); printf("%d ", root->value); inorder(root->right); } /* PRE-ORDER: root → left → right → useful for copying the tree */ void preorder(Node *root) { if (root == NULL) return; printf("%d ", root->value); preorder(root->left); preorder(root->right); } /* POST-ORDER: left → right → root → useful for deleting the tree */ void postorder(Node *root) { if (root == NULL) return; postorder(root->left); postorder(root->right); printf("%d ", root->value); }
printf relative to the recursive calls tells you the traversal type. Print first = pre-order, print in-between = in-order, print last = post-order.
Search and find_min
/* Search: return node with val, or NULL if not found */ Node *bst_search(Node *root, int val) { if (root == NULL || root->value == val) return root; if (val < root->value) return bst_search(root->left, val); return bst_search(root->right, val); } /* find_min: keep going left until no left child */ Node *find_min(Node *root) { if (root == NULL || root->left == NULL) return root; return find_min(root->left); }
Complete programs you can compile and run
#include <stdio.h>
#include <stdlib.h>
typedef struct bst_node {
int value;
struct bst_node *left;
struct bst_node *right;
} Node;
/* Create a new leaf node */
Node *create_node(int val) {
Node *n = malloc(sizeof(Node));
if (!n) { perror("malloc"); exit(1); }
n->value = val;
n->left = NULL;
n->right = NULL;
return n;
}
/* Insert val into BST rooted at root, return (possibly new) root */
Node *bst_insert(Node *root, int val) {
if (root == NULL) return create_node(val);
if (val < root->value) root->left = bst_insert(root->left, val);
else if (val > root->value) root->right = bst_insert(root->right, val);
/* duplicate: ignore */
return root;
}
/* Search — return node or NULL */
Node *bst_search(Node *root, int val) {
if (root == NULL || root->value == val) return root;
if (val < root->value) return bst_search(root->left, val);
return bst_search(root->right, val);
}
/* In-order traversal: prints values in ascending sorted order */
void inorder(Node *root) {
if (root == NULL) return;
inorder(root->left);
printf("%d ", root->value);
inorder(root->right);
}
/* Post-order free: free children before parent */
void bst_free(Node *root) {
if (root == NULL) return;
bst_free(root->left);
bst_free(root->right);
free(root);
}
int main(void) {
Node *root = NULL;
/* Insert values — BST will arrange them automatically */
int values[] = {50, 30, 70, 20, 40, 60, 80};
int n = sizeof(values) / sizeof(values[0]);
for (int i = 0; i < n; i++)
root = bst_insert(root, values[i]);
/* The tree looks like:
50
/ \
30 70
/ \ / \
20 40 60 80 */
printf("In-order (sorted): ");
inorder(root);
printf("\n");
/* Search */
Node *found = bst_search(root, 40);
printf("Search 40: %s\n", found ? "found" : "not found");
Node *miss = bst_search(root, 99);
printf("Search 99: %s\n", miss ? "found" : "not found");
bst_free(root);
return 0;
}
Search 40: found
Search 99: not found
#include <stdio.h>
#include <stdlib.h>
typedef struct bst_node { int v; struct bst_node *l, *r; } Node;
Node *new_node(int v) {
Node *n = malloc(sizeof(Node));
n->v = v; n->l = n->r = NULL;
return n;
}
Node *insert(Node *root, int v) {
if (!root) return new_node(v);
if (v < root->v) root->l = insert(root->l, v);
else if (v > root->v) root->r = insert(root->r, v);
return root;
}
void inorder (Node *r) { if(!r) return; inorder(r->l); printf("%d ", r->v); inorder(r->r); }
void preorder (Node *r) { if(!r) return; printf("%d ", r->v); preorder(r->l); preorder(r->r); }
void postorder(Node *r) { if(!r) return; postorder(r->l); postorder(r->r); printf("%d ", r->v); }
/* find_min: go left until no left child */
Node *find_min(Node *r) {
if (!r || !r->l) return r;
return find_min(r->l);
}
/* count_nodes: 1 + count left + count right */
int count_nodes(Node *r) {
if (!r) return 0;
return 1 + count_nodes(r->l) + count_nodes(r->r);
}
/* tree_height: 1 + max(height_left, height_right) */
int tree_height(Node *r) {
if (!r) return 0;
int lh = tree_height(r->l);
int rh = tree_height(r->r);
return 1 + (lh > rh ? lh : rh);
}
void free_tree(Node *r) { if(!r) return; free_tree(r->l); free_tree(r->r); free(r); }
int main(void) {
/* Insert: 5, 3, 7, 1, 4, 6, 8 */
Node *root = NULL;
int vals[] = {5,3,7,1,4,6,8};
for (int i = 0; i < 7; i++) root = insert(root, vals[i]);
/* Tree structure:
5
/ \
3 7
/ \ / \
1 4 6 8 */
printf("In-order (L→R→Rt): "); inorder(root); printf("\n");
printf("Pre-order (Rt→L→R): "); preorder(root); printf("\n");
printf("Post-order (L→R→Rt): "); postorder(root); printf("\n");
Node *mn = find_min(root);
printf("Min value: %d\n", mn ? mn->v : -1);
printf("Node count: %d\n", count_nodes(root));
printf("Tree height: %d\n", tree_height(root));
free_tree(root);
return 0;
}
Pre-order (Rt→L→R): 5 3 1 4 7 6 8
Post-order (L→R→Rt): 1 4 3 6 8 7 5
Min value: 1
Node count: 7
Tree height: 3
#include <stdio.h>
#include <stdlib.h>
typedef struct Node { int v; struct Node *l, *r; } Node;
Node *new_node(int v) { Node *n=malloc(sizeof(Node)); n->v=v; n->l=n->r=NULL; return n; }
Node *insert(Node *r, int v) {
if(!r) return new_node(v);
if(v < r->v) r->l=insert(r->l,v); else if(v > r->v) r->r=insert(r->r,v);
return r;
}
void inorder(Node *r) { if(!r) return; inorder(r->l); printf("%d ",r->v); inorder(r->r); }
/* find_min returns the leftmost node (no malloc — just a pointer) */
Node *find_min(Node *r) { return (!r||!r->l) ? r : find_min(r->l); }
/* Delete val from BST. Returns new root. */
Node *bst_delete(Node *root, int val) {
if (root == NULL) return NULL; /* val not found */
if (val < root->v) {
root->l = bst_delete(root->l, val); /* go left */
} else if (val > root->v) {
root->r = bst_delete(root->r, val); /* go right */
} else {
/* FOUND the node to delete — three cases: */
/* Case 1: leaf node (no children) */
if (root->l == NULL && root->r == NULL) {
free(root);
return NULL;
}
/* Case 2: one child — replace node with that child */
if (root->l == NULL) {
Node *tmp = root->r;
free(root);
return tmp;
}
if (root->r == NULL) {
Node *tmp = root->l;
free(root);
return tmp;
}
/* Case 3: two children — replace value with in-order successor
(smallest value in right subtree), then delete the successor */
Node *successor = find_min(root->r);
root->v = successor->v; /* copy successor's value up */
root->r = bst_delete(root->r, successor->v); /* delete the successor */
}
return root;
}
void free_tree(Node *r) { if(!r) return; free_tree(r->l); free_tree(r->r); free(r); }
int main(void) {
Node *root = NULL;
int vals[] = {50,30,70,20,40,60,80};
for (int i=0; i<7; i++) root = insert(root, vals[i]);
printf("Before delete: "); inorder(root); printf("\n");
root = bst_delete(root, 20); /* Case 1: leaf */
printf("Delete 20 (leaf): "); inorder(root); printf("\n");
root = bst_delete(root, 30); /* Case 2: one child (only right=40) */
printf("Delete 30 (1 child): "); inorder(root); printf("\n");
root = bst_delete(root, 50); /* Case 3: two children (root!) */
printf("Delete 50 (2 children): "); inorder(root); printf("\n");
free_tree(root);
return 0;
}
Delete 20 (leaf): 30 40 50 60 70 80
Delete 30 (1 child): 40 50 60 70 80
Delete 50 (2 children): 40 60 70 80
Practice problems with solutions
Starting from an empty BST, insert the following values in order: 40, 20, 60, 10, 30, 50, 70. Draw the final tree structure. What is the output of in-order, pre-order, and post-order traversals?
Tree after all insertions:
40
/ \
20 60
/ \ / \
10 30 50 70
In-order (sorted): 10 20 30 40 50 60 70
Pre-order (root first): 40 20 10 30 60 50 70
Post-order (root last): 10 30 20 50 70 60 40
• Insert 40 → root
• Insert 20 → 20 < 40, goes left of 40
• Insert 60 → 60 > 40, goes right of 40
• Insert 10 → 10 < 40, go left → 10 < 20, goes left of 20
• Insert 30 → 30 < 40, go left → 30 > 20, goes right of 20
• Insert 50 → 50 > 40, go right → 50 < 60, goes left of 60
• Insert 70 → 70 > 40, go right → 70 > 60, goes right of 60
In-order always gives sorted output — this is the most important property to remember for exams.
Given the BST node struct below, implement two functions:
(a) Node *find_max(Node *root) — returns the node with the maximum value, or NULL if the tree is empty.
(b) int count_leaves(Node *root) — returns the number of leaf nodes (nodes with no children).
typedef struct bst_node { int v; struct bst_node *l, *r; } Node;
/* find_max: keep going right until no right child */
Node *find_max(Node *root) {
if (root == NULL || root->r == NULL)
return root;
return find_max(root->r);
}
/* count_leaves: a leaf has no children; sum 1 for each leaf */
int count_leaves(Node *root) {
if (root == NULL) return 0;
if (root->l == NULL && root->r == NULL) return 1; /* leaf! */
return count_leaves(root->l) + count_leaves(root->r);
}
root->r == NULL. This is the mirror of find_min which follows left pointers.
count_leaves: Base case 1: NULL tree → 0 leaves. Base case 2: no left and no right → this IS a leaf, return 1. Recursive case: sum leaves in both subtrees. Note the difference between returning NULL (empty) and returning 1 (leaf) — both are base cases but with different conditions.
The code below attempts to insert into a BST. It compiles and seems to work, but has a serious bug. Identify and fix it.
typedef struct Node { int v; struct Node *l, *r; } Node;
void bst_insert(Node *root, int val) {
if (root == NULL) {
root = malloc(sizeof(Node));
root->v = val;
root->l = root->r = NULL;
return;
}
if (val < root->v) bst_insert(root->l, val);
else bst_insert(root->r, val);
}
/* FIXED: return Node* and use assignment at call site */
Node *bst_insert(Node *root, int val) {
if (root == NULL) {
Node *n = malloc(sizeof(Node));
n->v = val;
n->l = n->r = NULL;
return n; /* return the new node */
}
if (val < root->v) root->l = bst_insert(root->l, val);
else if (val > root->v) root->r = bst_insert(root->r, val);
return root; /* always return root */
}
/* Usage: */
Node *root = NULL;
root = bst_insert(root, 5); /* must reassign root */
void and modifies a local copy of root. When you write root = malloc(...), you're only changing the local parameter — the caller's pointer is unchanged. The malloc'd memory is leaked immediately.
Bug 2: Recursive calls don't link new nodes:
bst_insert(root->l, val) discards the returned node — nothing gets linked. The fix is root->l = bst_insert(root->l, val).
Fix: Return
Node* and use the "return and reassign" pattern at every level.
Insert the values 1, 2, 3, 4, 5 in order into an empty BST. Draw the resulting tree structure. What is the height? What is the time complexity of search in this tree? How does this compare to a balanced BST with 5 nodes?
Inserting 1, 2, 3, 4, 5 in sorted order:
1
\
2
\
3
\
4
\
5
Height = 5 (degenerate — a linked list!)
Search worst case: O(n) = O(5) — must traverse all nodes
Compare to a balanced BST:
3
/ \
2 4
/ \
1 5
Height = 3
Search worst case: O(log n) = O(3)
Implication: For sorted (or nearly sorted) input, an unbalanced BST provides no advantage over a linked list. This is why real-world BST implementations use rebalancing (AVL trees, Red-Black trees) or randomized approaches. For COMP2017 exams, always note that O(log n) is the average case for random data — worst case is O(n).
Implement void bst_free(Node *root) that frees all nodes in the BST. Explain why post-order traversal is the correct traversal order to use, and why in-order or pre-order would cause problems.
void bst_free(Node *root) {
if (root == NULL) return;
bst_free(root->l); /* free left subtree first */
bst_free(root->r); /* free right subtree next */
free(root); /* THEN free this node */
/* post-order: children before parent */
}
/* WHY NOT pre-order (free parent first)?
void bst_free_wrong(Node *root) {
if (root == NULL) return;
free(root); // parent freed!
bst_free_wrong(root->l); // root->l is now a dangling pointer read — UB!
bst_free_wrong(root->r); // UB!
}
*/
l and r fields are no longer valid to read — you'd be accessing freed memory (use-after-free, undefined behavior).
Why not in-order: In-order frees the left subtree, then the current node (leaving the right subtree orphaned — memory leak), then the right subtree. Wait — actually you access
root->r after free(root) in an in-order implementation, which is UB.
Rule of thumb: Post-order is the natural traversal for freeing trees and evaluating expression trees — always process children before the parent.
Key concepts to memorize
Test your understanding
void insert(Node *root, int val) {
if (!root) {
root = malloc(sizeof(Node));
root->v = val;
root->l = root->r = NULL;
} else if (val < root->v) insert(root->l, val);
else insert(root->r, val);
}