What is a Binary Search Tree?

Real-World Analogy — The Sorted Card Catalogue

Imagine a library card catalogue where every drawer has a label. If you're looking for a book starting with "M", you open the middle drawer. Everything left of it is A–L, everything right is N–Z. You've halved your search space instantly. A BST works exactly the same way — each node is a decision point that eliminates half the remaining possibilities. This is why search is O(log n) on a balanced tree instead of O(n) for a linked list.

A Binary Search Tree is a linked data structure where each node holds a value and two pointers: left (to a smaller subtree) and right (to a larger subtree). The defining rule — the BST property — is:

The BST Property

For every node N:
• All values in N's left subtree are strictly less than N's value.
• All values in N's right subtree are strictly greater than N's value.

This property must hold for every node in the tree, not just the root. It's what enables efficient search: at each node you make one comparison and discard half the tree.

BST vs Linked List: A linked list requires O(n) search — you must scan every element. A balanced BST requires O(log n) — you halve the search space at each step. For 1,000,000 elements, a linked list might need 1,000,000 comparisons; a balanced BST needs at most 20.

Why "unbalanced" matters: If you insert already-sorted data (e.g., 1, 2, 3, 4, 5) into a naive BST, it degenerates into a linked list — every node goes to the right child, giving O(n) worst case. Self-balancing trees (AVL, Red-Black) fix this, but COMP2017 focuses on the unbalanced version.

BST vs Other Structures

OperationArray (sorted)Linked ListBST (balanced)BST (degenerate)
Search O(log n) binary search O(n) O(log n) O(n)
Insert O(n) shift elements O(n) find position O(log n) O(n)
Delete O(n) shift elements O(n) O(log n) O(n)
In-order traversal O(n) O(n) unsorted O(n) gives sorted output O(n) gives sorted output
The Key Insight

In-order traversal of a BST always produces values in sorted ascending order. This is a direct consequence of the BST property and is one of the most useful properties to remember for exams.

Duplicates

The standard BST definition uses strictly less/greater — duplicates are typically disallowed, or handled by a consistent rule (e.g., duplicates always go left). COMP2017 exercises usually assume all values are unique.

Node struct, insert, search, traversals

The BST Node Struct

/* BST node — holds a value and two child pointers */
struct bst_node {
    int               value;    /* the data stored at this node */
    struct bst_node *left;     /* pointer to left child (smaller values) */
    struct bst_node *right;    /* pointer to right child (larger values) */
};

/* Common shorthand with typedef */
typedef struct bst_node Node;

/* Create a new node on the heap */
Node *create_node(int val) {
    Node *n = malloc(sizeof(Node));
    n->value = val;
    n->left  = NULL;
    n->right = NULL;
    return n;
}
Why malloc? Each node lives on the heap so it persists beyond the function call. left and right are initialised to NULL — NULL is how you represent "no child here" (a leaf or empty slot).

Insert — Recursive Approach

/* Insert val into BST rooted at root. Returns new root. */
Node *bst_insert(Node *root, int val) {
    /* Base case: empty spot found — create a node here */
    if (root == NULL)
        return create_node(val);

    /* Recursive case: decide left or right */
    if (val < root->value)
        root->left  = bst_insert(root->left,  val);
    else if (val > root->value)
        root->right = bst_insert(root->right, val);
    /* val == root->value: duplicate, ignore */

    return root;  /* return (possibly unchanged) root */
}
Pattern: The "return root and reassign" pattern means every recursive call can either return an existing node (no change) or a newly created node (the new leaf). The caller links it in via root->left = bst_insert(...).

The Three Traversals

/* IN-ORDER: left → root → right  → gives SORTED output */
void inorder(Node *root) {
    if (root == NULL) return;
    inorder(root->left);
    printf("%d ", root->value);
    inorder(root->right);
}

/* PRE-ORDER: root → left → right  → useful for copying the tree */
void preorder(Node *root) {
    if (root == NULL) return;
    printf("%d ", root->value);
    preorder(root->left);
    preorder(root->right);
}

/* POST-ORDER: left → right → root  → useful for deleting the tree */
void postorder(Node *root) {
    if (root == NULL) return;
    postorder(root->left);
    postorder(root->right);
    printf("%d ", root->value);
}
Memory aid: The order of printf relative to the recursive calls tells you the traversal type. Print first = pre-order, print in-between = in-order, print last = post-order.

Search and find_min

/* Search: return node with val, or NULL if not found */
Node *bst_search(Node *root, int val) {
    if (root == NULL || root->value == val)
        return root;
    if (val < root->value)
        return bst_search(root->left,  val);
    return bst_search(root->right, val);
}

/* find_min: keep going left until no left child */
Node *find_min(Node *root) {
    if (root == NULL || root->left == NULL)
        return root;
    return find_min(root->left);
}
find_min logic: The minimum value is always the leftmost node (keep following left pointers until NULL). Symmetrically, find_max keeps following right pointers. Both are O(h) where h = tree height.

Complete programs you can compile and run

Example 1 — Build a BST and print sorted output via in-order traversal COMP2017 Lessons — BST task
#include <stdio.h>
#include <stdlib.h>

typedef struct bst_node {
    int value;
    struct bst_node *left;
    struct bst_node *right;
} Node;

/* Create a new leaf node */
Node *create_node(int val) {
    Node *n = malloc(sizeof(Node));
    if (!n) { perror("malloc"); exit(1); }
    n->value = val;
    n->left  = NULL;
    n->right = NULL;
    return n;
}

/* Insert val into BST rooted at root, return (possibly new) root */
Node *bst_insert(Node *root, int val) {
    if (root == NULL)         return create_node(val);
    if (val < root->value)   root->left  = bst_insert(root->left,  val);
    else if (val > root->value) root->right = bst_insert(root->right, val);
    /* duplicate: ignore */
    return root;
}

/* Search — return node or NULL */
Node *bst_search(Node *root, int val) {
    if (root == NULL || root->value == val) return root;
    if (val < root->value) return bst_search(root->left,  val);
    return bst_search(root->right, val);
}

/* In-order traversal: prints values in ascending sorted order */
void inorder(Node *root) {
    if (root == NULL) return;
    inorder(root->left);
    printf("%d ", root->value);
    inorder(root->right);
}

/* Post-order free: free children before parent */
void bst_free(Node *root) {
    if (root == NULL) return;
    bst_free(root->left);
    bst_free(root->right);
    free(root);
}

int main(void) {
    Node *root = NULL;

    /* Insert values — BST will arrange them automatically */
    int values[] = {50, 30, 70, 20, 40, 60, 80};
    int n = sizeof(values) / sizeof(values[0]);
    for (int i = 0; i < n; i++)
        root = bst_insert(root, values[i]);

    /* The tree looks like:
              50
            /    \
          30      70
         /  \    /  \
       20   40  60   80          */

    printf("In-order (sorted):  ");
    inorder(root);
    printf("\n");

    /* Search */
    Node *found = bst_search(root, 40);
    printf("Search 40: %s\n", found ? "found" : "not found");
    Node *miss  = bst_search(root, 99);
    printf("Search 99: %s\n", miss  ? "found" : "not found");

    bst_free(root);
    return 0;
}
Output
In-order (sorted): 20 30 40 50 60 70 80
Search 40: found
Search 99: not found
Example 2 — All three traversals + find_min + count_nodes BST operations reference
#include <stdio.h>
#include <stdlib.h>

typedef struct bst_node { int v; struct bst_node *l, *r; } Node;

Node *new_node(int v) {
    Node *n = malloc(sizeof(Node));
    n->v = v; n->l = n->r = NULL;
    return n;
}
Node *insert(Node *root, int v) {
    if (!root) return new_node(v);
    if (v < root->v) root->l = insert(root->l, v);
    else if (v > root->v) root->r = insert(root->r, v);
    return root;
}

void inorder  (Node *r) { if(!r) return; inorder(r->l);   printf("%d ", r->v); inorder(r->r);   }
void preorder (Node *r) { if(!r) return; printf("%d ", r->v); preorder(r->l);  preorder(r->r);  }
void postorder(Node *r) { if(!r) return; postorder(r->l); postorder(r->r); printf("%d ", r->v); }

/* find_min: go left until no left child */
Node *find_min(Node *r) {
    if (!r || !r->l) return r;
    return find_min(r->l);
}

/* count_nodes: 1 + count left + count right */
int count_nodes(Node *r) {
    if (!r) return 0;
    return 1 + count_nodes(r->l) + count_nodes(r->r);
}

/* tree_height: 1 + max(height_left, height_right) */
int tree_height(Node *r) {
    if (!r) return 0;
    int lh = tree_height(r->l);
    int rh = tree_height(r->r);
    return 1 + (lh > rh ? lh : rh);
}

void free_tree(Node *r) { if(!r) return; free_tree(r->l); free_tree(r->r); free(r); }

int main(void) {
    /* Insert: 5, 3, 7, 1, 4, 6, 8 */
    Node *root = NULL;
    int vals[] = {5,3,7,1,4,6,8};
    for (int i = 0; i < 7; i++) root = insert(root, vals[i]);

    /* Tree structure:
           5
          / \
         3   7
        / \ / \
       1  4 6  8     */

    printf("In-order   (L→R→Rt): "); inorder(root);   printf("\n");
    printf("Pre-order  (Rt→L→R): "); preorder(root);  printf("\n");
    printf("Post-order (L→R→Rt): "); postorder(root); printf("\n");

    Node *mn = find_min(root);
    printf("Min value: %d\n", mn ? mn->v : -1);
    printf("Node count: %d\n", count_nodes(root));
    printf("Tree height: %d\n", tree_height(root));

    free_tree(root);
    return 0;
}
Output
In-order (L→R→Rt): 1 3 4 5 6 7 8
Pre-order (Rt→L→R): 5 3 1 4 7 6 8
Post-order (L→R→Rt): 1 4 3 6 8 7 5
Min value: 1
Node count: 7
Tree height: 3
Example 3 — BST delete (all three cases) Advanced — COMP2017 BST 2
#include <stdio.h>
#include <stdlib.h>

typedef struct Node { int v; struct Node *l, *r; } Node;
Node *new_node(int v) { Node *n=malloc(sizeof(Node)); n->v=v; n->l=n->r=NULL; return n; }
Node *insert(Node *r, int v) {
    if(!r) return new_node(v);
    if(v < r->v) r->l=insert(r->l,v); else if(v > r->v) r->r=insert(r->r,v);
    return r;
}
void inorder(Node *r) { if(!r) return; inorder(r->l); printf("%d ",r->v); inorder(r->r); }

/* find_min returns the leftmost node (no malloc — just a pointer) */
Node *find_min(Node *r) { return (!r||!r->l) ? r : find_min(r->l); }

/* Delete val from BST. Returns new root. */
Node *bst_delete(Node *root, int val) {
    if (root == NULL) return NULL;  /* val not found */

    if (val < root->v) {
        root->l = bst_delete(root->l, val);  /* go left */
    } else if (val > root->v) {
        root->r = bst_delete(root->r, val);  /* go right */
    } else {
        /* FOUND the node to delete — three cases: */

        /* Case 1: leaf node (no children) */
        if (root->l == NULL && root->r == NULL) {
            free(root);
            return NULL;
        }

        /* Case 2: one child — replace node with that child */
        if (root->l == NULL) {
            Node *tmp = root->r;
            free(root);
            return tmp;
        }
        if (root->r == NULL) {
            Node *tmp = root->l;
            free(root);
            return tmp;
        }

        /* Case 3: two children — replace value with in-order successor
           (smallest value in right subtree), then delete the successor */
        Node *successor = find_min(root->r);
        root->v = successor->v;             /* copy successor's value up */
        root->r = bst_delete(root->r, successor->v);  /* delete the successor */
    }
    return root;
}

void free_tree(Node *r) { if(!r) return; free_tree(r->l); free_tree(r->r); free(r); }

int main(void) {
    Node *root = NULL;
    int vals[] = {50,30,70,20,40,60,80};
    for (int i=0; i<7; i++) root = insert(root, vals[i]);

    printf("Before delete: "); inorder(root); printf("\n");

    root = bst_delete(root, 20);  /* Case 1: leaf */
    printf("Delete 20 (leaf): "); inorder(root); printf("\n");

    root = bst_delete(root, 30);  /* Case 2: one child (only right=40) */
    printf("Delete 30 (1 child): "); inorder(root); printf("\n");

    root = bst_delete(root, 50);  /* Case 3: two children (root!) */
    printf("Delete 50 (2 children): "); inorder(root); printf("\n");

    free_tree(root);
    return 0;
}
Output
Before delete: 20 30 40 50 60 70 80
Delete 20 (leaf): 30 40 50 60 70 80
Delete 30 (1 child): 40 50 60 70 80
Delete 50 (2 children): 40 60 70 80

Practice problems with solutions

P1 — Trace insertion sequence: what does the BST look like? COMP2017 Lessons — BST task

Starting from an empty BST, insert the following values in order: 40, 20, 60, 10, 30, 50, 70. Draw the final tree structure. What is the output of in-order, pre-order, and post-order traversals?

Tree after all insertions:
        40
       /  \
      20   60
     / \  / \
    10  30 50 70

In-order   (sorted): 10 20 30 40 50 60 70
Pre-order  (root first): 40 20 10 30 60 50 70
Post-order (root last): 10 30 20 50 70 60 40
Insertion trace:
• Insert 40 → root
• Insert 20 → 20 < 40, goes left of 40
• Insert 60 → 60 > 40, goes right of 40
• Insert 10 → 10 < 40, go left → 10 < 20, goes left of 20
• Insert 30 → 30 < 40, go left → 30 > 20, goes right of 20
• Insert 50 → 50 > 40, go right → 50 < 60, goes left of 60
• Insert 70 → 70 > 40, go right → 70 > 60, goes right of 60

In-order always gives sorted output — this is the most important property to remember for exams.
P2 — Implement find_max and count_leaves COMP2017 Lessons + Tutorial

Given the BST node struct below, implement two functions:
(a) Node *find_max(Node *root) — returns the node with the maximum value, or NULL if the tree is empty.
(b) int count_leaves(Node *root) — returns the number of leaf nodes (nodes with no children).

typedef struct bst_node { int v; struct bst_node *l, *r; } Node;
/* find_max: keep going right until no right child */
Node *find_max(Node *root) {
    if (root == NULL || root->r == NULL)
        return root;
    return find_max(root->r);
}

/* count_leaves: a leaf has no children; sum 1 for each leaf */
int count_leaves(Node *root) {
    if (root == NULL) return 0;
    if (root->l == NULL && root->r == NULL) return 1;  /* leaf! */
    return count_leaves(root->l) + count_leaves(root->r);
}
find_max: The maximum value is always the rightmost node — follow right pointers until root->r == NULL. This is the mirror of find_min which follows left pointers.

count_leaves: Base case 1: NULL tree → 0 leaves. Base case 2: no left and no right → this IS a leaf, return 1. Recursive case: sum leaves in both subtrees. Note the difference between returning NULL (empty) and returning 1 (leaf) — both are base cases but with different conditions.
P3 — Spot the bug: BST insert that causes a memory leak Common exam pitfall

The code below attempts to insert into a BST. It compiles and seems to work, but has a serious bug. Identify and fix it.

typedef struct Node { int v; struct Node *l, *r; } Node;

void bst_insert(Node *root, int val) {
    if (root == NULL) {
        root = malloc(sizeof(Node));
        root->v = val;
        root->l = root->r = NULL;
        return;
    }
    if (val < root->v)  bst_insert(root->l,  val);
    else                bst_insert(root->r,  val);
}
/* FIXED: return Node* and use assignment at call site */
Node *bst_insert(Node *root, int val) {
    if (root == NULL) {
        Node *n = malloc(sizeof(Node));
        n->v = val;
        n->l = n->r = NULL;
        return n;             /* return the new node */
    }
    if (val < root->v)  root->l = bst_insert(root->l, val);
    else if (val > root->v) root->r = bst_insert(root->r, val);
    return root;              /* always return root */
}

/* Usage: */
Node *root = NULL;
root = bst_insert(root, 5);  /* must reassign root */
Bug 1: The function is void and modifies a local copy of root. When you write root = malloc(...), you're only changing the local parameter — the caller's pointer is unchanged. The malloc'd memory is leaked immediately.

Bug 2: Recursive calls don't link new nodes: bst_insert(root->l, val) discards the returned node — nothing gets linked. The fix is root->l = bst_insert(root->l, val).

Fix: Return Node* and use the "return and reassign" pattern at every level.
P4 — Why does inserting sorted data make a BST degenerate? Tutorial discussion question

Insert the values 1, 2, 3, 4, 5 in order into an empty BST. Draw the resulting tree structure. What is the height? What is the time complexity of search in this tree? How does this compare to a balanced BST with 5 nodes?

Inserting 1, 2, 3, 4, 5 in sorted order:

1
 \
  2
   \
    3
     \
      4
       \
        5

Height = 5  (degenerate — a linked list!)
Search worst case: O(n) = O(5) — must traverse all nodes

Compare to a balanced BST:
    3
   / \
  2   4
 /     \
1       5

Height = 3
Search worst case: O(log n) = O(3)
Why it degenerates: Each new value is greater than the previous, so every insert goes to the right child. The BST property is maintained — left subtrees are empty everywhere — but the "tree" is really just a chain.

Implication: For sorted (or nearly sorted) input, an unbalanced BST provides no advantage over a linked list. This is why real-world BST implementations use rebalancing (AVL trees, Red-Black trees) or randomized approaches. For COMP2017 exams, always note that O(log n) is the average case for random data — worst case is O(n).
P5 — Write bst_free using post-order traversal COMP2017 Lessons — memory management

Implement void bst_free(Node *root) that frees all nodes in the BST. Explain why post-order traversal is the correct traversal order to use, and why in-order or pre-order would cause problems.

void bst_free(Node *root) {
    if (root == NULL) return;
    bst_free(root->l);   /* free left subtree first */
    bst_free(root->r);   /* free right subtree next */
    free(root);           /* THEN free this node */
    /* post-order: children before parent */
}

/* WHY NOT pre-order (free parent first)?
   void bst_free_wrong(Node *root) {
       if (root == NULL) return;
       free(root);                       // parent freed!
       bst_free_wrong(root->l);          // root->l is now a dangling pointer read — UB!
       bst_free_wrong(root->r);          // UB!
   }
*/
Why post-order: You must free the children before freeing the parent. If you free the parent first (pre-order), the parent's l and r fields are no longer valid to read — you'd be accessing freed memory (use-after-free, undefined behavior).

Why not in-order: In-order frees the left subtree, then the current node (leaving the right subtree orphaned — memory leak), then the right subtree. Wait — actually you access root->r after free(root) in an in-order implementation, which is UB.

Rule of thumb: Post-order is the natural traversal for freeing trees and evaluating expression trees — always process children before the parent.

Key concepts to memorize

Card 1 of 10
Question — click to flip
Answer
Click card to flip • Use buttons to navigate

Test your understanding

Topic 14 Quiz — Binary Search Trees Score: 0 / 6
1
Which traversal of a BST always produces values in sorted ascending order?LO5
multiple choice
2
True or False: Inserting the sequence 1, 2, 3, 4, 5 into an empty BST produces a tree with height 3.LO5
true / false
3
What is the correct way to find the minimum value in a BST?LO5
multiple choice
4
Fill in the blank: To free all nodes in a BST without accessing freed memory, use ___ traversal (free children before parent).LO5
fill in the blank
5
Spot the bug: what is wrong with this BST insert?LO5
void insert(Node *root, int val) {
    if (!root) {
        root = malloc(sizeof(Node));
        root->v = val;
        root->l = root->r = NULL;
    } else if (val < root->v) insert(root->l, val);
    else insert(root->r, val);
}
spot the bug — multiple choice
6
When deleting a node with two children from a BST, what value replaces it?LO5
multiple choice
0/6
Quiz complete!