Topic 32 — Number Systems & Data Representation

📖 Concept Explanation

Everything is just bits

The Big Idea

A computer only understands two states: power on (1) and power off (0). That's it. Everything — your photos, this text, every number, every instruction — is just a pattern of 1s and 0s stored in transistors. The entire discipline of number systems in COMP2017 is about: how do we encode all the information we care about as patterns of those two symbols, and how do we efficiently read and write those patterns in C?

Binary (Base 2) — the native language of hardware

In everyday life we use decimal (base 10) — ten digits (0–9), where each position is a power of 10. Binary works the same way but with only two digits (0 and 1) and powers of 2. The key insight: position = power of the base.

Position	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
Value	128	64	32	16	8	4	2	1
42 =	0	0	1	0	1	0	1	0

So 42 = 32 + 8 + 2 = 101010₂. To convert decimal to binary by hand, use repeated division by 2, collecting remainders from bottom to top: 42÷2=21 r0, 21÷2=10 r1, 10÷2=5 r0, 5÷2=2 r1, 2÷2=1 r0, 1÷2=0 r1 → read remainders bottom-up: 101010.

In C you cannot write int x = 0b101010; portably until C23, but you can always use hex or decimal. In Python, bin(42) gives '0b101010' instantly — in C you must understand the bits directly, because C gives you direct access to memory.

Hexadecimal (Base 16) — the compact shorthand

Binary is precise but verbose. Hexadecimal (base 16, digits 0–9 then A–F) is the programmer's shorthand because one hex digit maps exactly to 4 binary bits (a nibble). This means any byte (8 bits) is always exactly 2 hex digits.

Decimal	Binary	Hex	Decimal	Binary	Hex
0	0000	0	8	1000	8
1	0001	1	9	1001	9
4	0100	4	10	1010	A
7	0111	7	15	1111	F

In C, hex literals start with 0x: int mask = 0xFF; is 255. Octal literals start with a leading zero: int perm = 0777; — this is why writing int x = 08; is a compile error (8 is not a valid octal digit!). The printf format specifiers %x (lowercase) and %X (uppercase) print an integer as hex. %o prints octal.

Common Pitfall — Octal Literals

A leading zero makes an integer literal octal. int x = 010; is 8, not 10. This silently compiles with no warning but gives wrong results. Always use 0x (hex) or no prefix (decimal) for clarity.

Two's Complement — storing negative integers

How does C store -1 as bits? It uses two's complement encoding. The rule is elegant: for an n-bit type, the most-significant bit (MSB) has weight −2^n-1 instead of +2^n-1. All other bits still have positive weights.

Bit 7 (weight)	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1	Bit 0	Value
−128	64	32	16	8	4	2	1
1	1	1	1	1	1	1	1	−1
1	0	0	0	0	0	0	0	−128
0	1	1	1	1	1	1	1	+127

Why is two's complement clever? Because addition and subtraction work identically for signed and unsigned integers — the hardware only needs one adder circuit. The MSB is called the sign bit: 0 means non-negative, 1 means negative.

To negate a two's complement number manually: flip all bits, then add 1. For example, negate +5 (00000101): flip → 11111010, add 1 → 11111011 = −5. Verify: −128+64+32+16+8+0+2+1 = −5. ✓

Wrap-around: For an unsigned char (0–255), adding 1 to 255 wraps back to 0. For signed char (−128 to +127), adding 1 to 127 wraps to −128. In C, signed integer overflow is undefined behaviour — the compiler may assume it never happens and generate incorrect code. Unsigned overflow is well-defined and wraps modulo 2ⁿ.

Python vs C comparison

In Python, integers have arbitrary precision — you never overflow. In C, int is typically 32 bits. Printing INT_MAX + 1 is undefined behaviour in C, but Python handles it seamlessly. This difference is a major source of security vulnerabilities in C programs.

IEEE 754 — floating-point numbers

How does C store 3.14? It uses the IEEE 754 standard. A float uses 32 bits and a double uses 64 bits, split into three fields:

Sign (S)	Exponent (E)	Mantissa / Fraction (M)	Total
float: 1 bit	float: 8 bits	float: 23 bits	32 bits
double: 1 bit	double: 11 bits	double: 52 bits	64 bits

The value is decoded as: (-1)^S × 1.M × 2^(E-127) for float (E-1023 for double). The mantissa implicitly starts with "1." (the hidden bit), which is why there are 24 effective bits of precision in a float despite only 23 being stored.

Why 0.1 + 0.2 != 0.3? Because 0.1 in binary is a repeating fraction: 0.0001100110011... (like 1/3 in decimal). IEEE 754 stores only finitely many bits, so 0.1 is stored as the closest representable value, which is not exactly 0.1. When you add two slightly-wrong approximations, you get a slightly-wrong result — not 0.3. This is not a bug in C or your compiler; it is an inherent property of binary floating point. You will see this in every language that uses IEEE 754 (Python, JavaScript, Java...).

Special values in IEEE 754: +Infinity (e.g. 1.0/0.0), -Infinity, and NaN (Not a Number, e.g. 0.0/0.0 or sqrt(-1)). These are represented by special exponent bit patterns.

Catastrophic Cancellation

Subtracting two nearly-equal floating-point numbers can lose most significant digits. Example: if x = 1.0000001 and y = 1.0000000, then x−y should be 1e-7, but in float precision the leading digits cancel and the result has only 1 correct bit. The lecture mentions this as catastrophic cancellation. The fix: algebraically rearrange expressions before coding them. For example, replace (x*x - y*y) with (x-y)*(x+y).

Endianness — byte order in multi-byte types

An int is 4 bytes. When stored in memory at address 0x100, which byte goes first? This depends on the endianness of the processor:

System	Addr 0x100	Addr 0x101	Addr 0x102	Addr 0x103
Big-Endian	0x12 (MSB)	0x34	0x56	0x78 (LSB)
Little-Endian	0x78 (LSB)	0x56	0x34	0x12 (MSB)

Value stored: 0x12345678

Big-endian: most-significant byte first (like writing a number left-to-right). Little-endian: least-significant byte first. Your x86 or x86-64 PC (and most ARM phones) is little-endian. TCP/IP network byte order is big-endian — this is why functions like htons() ("host to network short") exist: they swap bytes when moving data from your little-endian CPU to the network.

You can detect endianness in C using a union trick: store an integer and read back individual bytes. The lecture shows hexdump -X (big-endian display) vs hexdump -x (little-endian display) producing different byte orderings for the same data on an x86 machine.

Endianness only matters when: (1) writing binary data to files or the network that will be read by another machine; (2) casting pointers between integer types of different sizes; (3) using union tricks to inspect raw bit patterns.

🔧 Syntax Breakdown

The C syntax you need to know

Format specifiers for numeric bases

printf("%d", val); → Signed decimal integer — the default for int

printf("%u", val); → Unsigned decimal integer — no sign bit interpretation

printf("%o", val); → Unsigned octal — groups of 3 bits

printf("%x", val); → Unsigned hex, lowercase (a-f) — most common in C

printf("%X", val); → Unsigned hex, uppercase (A-F)

printf("%f", val); → Floating point, decimal notation — default 6 decimal places

printf("%e", val); → Floating point, scientific notation — e.g. 1.234567e+02

printf("%08x\n", val); → Zero-padded 8-digit hex — the 0 means pad with zeros, 8 is field width

Lecture (Week 2): format string table — %d/%u for decimal, %o for octal, %x/%X for hex. %b is NOT in the C standard yet (%b arrives in C23).

Integer literal prefixes

42 → Decimal (base 10) — no prefix

052 → Octal (base 8) — leading zero. This is 42 in decimal. Warning: easy to confuse with a decimal literal.

0x2A → Hexadecimal (base 16) — 0x prefix. This is also 42.

0b101010 → Binary (base 2) — 0b prefix. C23 only. GCC supports as extension with -std=gnu11.

Bit-manipulation operators (quick reference)

a & b → Bitwise AND — keeps bits where both operands have 1. Use to mask bits.

a | b → Bitwise OR — sets bits where either operand has 1. Use to set bits.

a ^ b → Bitwise XOR — sets bits where operands differ. Use to toggle bits.

~a → Bitwise NOT — flips every bit. ~0 gives all-ones (−1 in signed, UINT_MAX in unsigned).

a << n → Left shift — multiply by 2ⁿ. Bits shifted out are lost. Fills with 0 on right.

a >> n → Right shift — divide by 2ⁿ (truncating). Unsigned: fills with 0. Signed: implementation-defined (usually arithmetic, filling with sign bit).

Union trick to inspect float bits

The union in C lets multiple members share the same memory. This is the standard way to reinterpret a float's raw bit pattern as an integer without invoking undefined behaviour (unlike direct pointer casts).

Inspecting a float's IEEE 754 bit pattern C — union trick

#include <stdio.h>

/* Union: both members share the same 4 bytes */
union FloatBits {
    float    f;        /* interpret as float */
    unsigned int u;    /* interpret as raw 32-bit pattern */
};

int main(void) {
    union FloatBits fb;
    fb.f = 1.0f;

    /* Print the raw bits of 1.0 in IEEE 754 */
    printf("float 1.0 bits: 0x%08X\n", fb.u);
    /* Expected: 0x3F800000
       Sign=0, Exponent=0x7F (127, biased), Mantissa=0 (hidden 1. prefix)
       Value = 1.0 * 2^(127-127) = 1.0 * 1 = 1.0  */

    fb.f = -0.5f;
    printf("float -0.5 bits: 0x%08X\n", fb.u);
    /* Sign=1, Exponent=0x7E (126), Mantissa=0
       Value = -1.0 * 2^(126-127) = -1.0 * 0.5 = -0.5 */

    return 0;
}

Output

float 1.0 bits: 0x3F800000
float -0.5 bits: 0xBF000000

📝 Worked Examples

Step-by-step walkthroughs

Example 1 — Convert 42 to binary (divide-by-2), then to hex (nibble grouping) Wk3A Tutorial

/*
 * STEP 1: Decimal to Binary via repeated division by 2
 *
 *   42 / 2 = 21  remainder 0     (LSB — written last, read first)
 *   21 / 2 = 10  remainder 1
 *   10 / 2 =  5  remainder 0
 *    5 / 2 =  2  remainder 1
 *    2 / 2 =  1  remainder 0
 *    1 / 2 =  0  remainder 1     (MSB — written first, read last)
 *
 *   Reading remainders bottom to top: 1 0 1 0 1 0
 *   42 (decimal) = 101010 (binary) = 00101010 (as 8-bit byte)
 *
 * STEP 2: Binary to Hex — group bits into nibbles (sets of 4) from right
 *
 *   Binary:    0010   1010
 *   Hex digit:   2      A
 *
 *   42 (decimal) = 0x2A (hex)
 *
 * STEP 3: Verify with C
 */
#include <stdio.h>

int main(void) {
    int n = 42;
    printf("Decimal : %d\n",  n);
    printf("Hex     : 0x%X\n", n);   /* 0x2A */
    printf("Octal   : 0%o\n", n);    /* 052 */
    return 0;
}

Output

Decimal : 42
Hex : 0x2A
Octal : 052

Example 2 — Convert 0xF6 to decimal and binary Wk3A Tutorial

/*
 * 0xF6: two hex digits — F and 6
 *
 * F = 1111 (binary),  6 = 0110 (binary)
 * Concatenate nibbles: 1111 0110 (binary)
 *
 * Convert 11110110 to decimal:
 *   1*128 + 1*64 + 1*32 + 1*16 + 0*8 + 1*4 + 1*2 + 0*1
 *   = 128 + 64 + 32 + 16 + 0 + 4 + 2 + 0
 *   = 246
 *
 * As signed int8_t: sign bit is 1, so it's negative.
 *   Two's complement value = -128 + 64 + 32 + 16 + 4 + 2 = -10
 */
#include <stdio.h>
#include <stdint.h>

int main(void) {
    unsigned char u = 0xF6;
    int8_t        s = (int8_t)0xF6;

    printf("0xF6 as unsigned char: %u\n",  u);   /* 246 */
    printf("0xF6 as int8_t:        %d\n",  s);   /* -10 */

    /* Verify binary representation manually: */
    /* bit 7: %d  */
    for (int i = 7; i >= 0; i--) {
        printf("%d", (u >> i) & 1);
    }
    printf("\n");   /* 11110110 */
    return 0;
}

Output

0xF6 as unsigned char: 246
0xF6 as int8_t: -10
11110110

Example 3 — Why 0.1 + 0.2 != 0.3 in C (IEEE 754 imprecision) Week 2 Lecture — IEEE 754

#include <stdio.h>

int main(void) {
    float  a = 0.1f, b = 0.2f, c = 0.3f;
    double da = 0.1,  db = 0.2,  dc = 0.3;

    /* Standard print — looks fine */
    printf("float:  0.1 + 0.2 = %.1f\n", a + b);    /* 0.3 */
    printf("double: 0.1 + 0.2 = %.1f\n", da + db);   /* 0.3 */

    /* High precision — the truth is revealed */
    printf("\nfloat  0.1 stored as: %.20f\n", a);
    printf("float  0.2 stored as: %.20f\n", b);
    printf("float  0.3 stored as: %.20f\n", c);
    printf("float  0.1+0.2 =      %.20f\n", a + b);

    printf("\ndouble 0.1 stored as: %.20f\n", da);
    printf("double 0.2 stored as: %.20f\n", db);
    printf("double 0.3 stored as: %.20f\n", dc);
    printf("double 0.1+0.2 =      %.20f\n", da + db);

    /* Equality tests */
    printf("\n(float) 0.1+0.2 == 0.3: %s\n", (a+b == c) ? "true" : "false");
    printf("(double) 0.1+0.2 == 0.3: %s\n", (da+db == dc) ? "true" : "false");
    return 0;
}

Output (representative — exact values are platform-dependent)

float 0.1 stored as: 0.10000000149011611938
float 0.2 stored as: 0.20000000298023223877
float 0.3 stored as: 0.30000001192092895508
float 0.1+0.2 = 0.30000001192092895508

double 0.1 stored as: 0.10000000000000000555
double 0.2 stored as: 0.20000000000000001110
(float) 0.1+0.2 == 0.3: true (lucky: same rounding)
(double) 0.1+0.2 == 0.3: false

Example 4 — Detect endianness of the current machine using a union Week 2 Lecture — Endianness

#include <stdio.h>

int main(void) {
    /* Store a known 32-bit value */
    union {
        unsigned int  val;
        unsigned char bytes[4];
    } u;

    u.val = 0x12345678;

    /* On a little-endian machine, the LOWEST address holds the LSB (0x78).
       On a big-endian machine, the LOWEST address holds the MSB (0x12). */
    printf("bytes[0] (lowest address) = 0x%02X\n", u.bytes[0]);
    printf("bytes[1]                  = 0x%02X\n", u.bytes[1]);
    printf("bytes[2]                  = 0x%02X\n", u.bytes[2]);
    printf("bytes[3] (highest address)= 0x%02X\n", u.bytes[3]);

    if (u.bytes[0] == 0x78) {
        printf("\nThis machine is LITTLE-ENDIAN (x86/x64/ARM typical).\n");
    } else if (u.bytes[0] == 0x12) {
        printf("\nThis machine is BIG-ENDIAN.\n");
    } else {
        printf("\nUnexpected byte order — mixed endianness?\n");
    }
    return 0;
}

Output on x86-64 (little-endian)

bytes[0] (lowest address) = 0x78
bytes[1] = 0x56
bytes[2] = 0x34
bytes[3] (highest address)= 0x12

This machine is LITTLE-ENDIAN (x86/x64/ARM typical).

🔬 Problems

Practice problems

P1 — Convert 73 to binary and hex Wk3A Tutorial

Without using a calculator or C, convert the decimal value 73 to binary (show the divide-by-2 working) and then to hexadecimal (show the nibble grouping). Verify the hex by computing F×16+? .

Divide-by-2 method:
  73 / 2 = 36 r 1   ← LSB
  36 / 2 = 18 r 0
  18 / 2 =  9 r 0
   9 / 2 =  4 r 1
   4 / 2 =  2 r 0
   2 / 2 =  1 r 0
   1 / 2 =  0 r 1   ← MSB
Read remainders bottom-up: 1 0 0 1 0 0 1
Padded to 8 bits:  0100 1001

Nibble grouping (right to left):
  1001 = 9
  0100 = 4
Hex: 0x49

Verify: 4 * 16 + 9 = 64 + 9 = 73 ✓

73 in binary is 01001001₂ and in hex is 0x49. The nibble trick works because 16 = 2⁴, so each hex digit captures exactly 4 bits. Always group from the right (LSB side) to avoid leading-zero errors.

P2 — What is 0x1A + 0b00110011 in decimal? Wk3A Tutorial

Evaluate 0x1A + 0b00110011 in decimal. Show intermediate conversions. Note: 0b binary literals are a GCC extension / C23 feature.

0x1A:
  1 * 16 + A * 1 = 16 + 10 = 26

0b00110011 (binary):
  Position:  7   6   5   4   3   2   1   0
  Bit:       0   0   1   1   0   0   1   1
  Value:             32 +16         +2  +1 = 51

Sum: 26 + 51 = 77

0x1A = 26. 0b00110011 = 32 + 16 + 2 + 1 = 51. Sum = 77. In C: printf("%d\n", 0x1A + 0b00110011); would print 77 on a C23-compatible compiler.

P3 — What does int x = -1; printf("%u\n", x); print and why? Week 2 Lecture — Integer representations

On a 32-bit or 64-bit system, what does the following print? Explain using two's complement and the difference between %d and %u.

#include <stdio.h>
int main(void) {
    int x = -1;
    printf("%u\n", x);
    return 0;
}

Output: 4294967295

Explanation:
-1 in two's complement 32-bit is all 1 bits:
  1111 1111 1111 1111 1111 1111 1111 1111

%u tells printf to INTERPRET those same bits as an unsigned int.
All bits set in a 32-bit unsigned int = 2^32 - 1 = 4294967295 = UINT_MAX.

The BITS in memory are unchanged — only how printf interprets them changes.
%d: treats MSB as sign bit (-1)
%u: treats all bits as magnitude (4294967295)

4294967295 (2³²−1). This demonstrates the key insight from the lecture's "meaning of numbers" slide: the byte pattern 0xFF...FF means −1 as a signed integer and UINT_MAX as an unsigned integer — same bits, different interpretation. Using the wrong format specifier is technically undefined behaviour in C, but on every real platform it produces this result.

P4 — Why does (float)0.1 + (float)0.2 != (float)0.3? What is this called? Week 2 Lecture — IEEE 754 / Catastrophic Cancellation

Explain in precise technical terms why the expression (float)0.1 + (float)0.2 != (float)0.3 may evaluate to true in C. What is the name for related precision-loss problems with floating-point subtraction? How should you compare floating-point numbers for equality in C?

/* Why it happens:
   0.1 in binary = 0.0001100110011...  (repeating, like 1/3 in decimal)
   IEEE 754 float has only 23 mantissa bits — it must ROUND to the
   nearest representable value.

   Stored 0.1f ≈ 0.10000000149...
   Stored 0.2f ≈ 0.20000000298...
   Their sum  ≈ 0.30000000447...
   Stored 0.3f ≈ 0.30000001192...

   These are different bit patterns → == returns 0 (false).

   The related problem is called CATASTROPHIC CANCELLATION:
   when two nearly-equal floats are subtracted, leading digits cancel
   and the result has very few correct significant bits.

   The correct way to compare floats: use an epsilon tolerance */

#include <stdio.h>
#include <math.h>
#include <float.h>

int float_equal(float a, float b) {
    return fabsf(a - b) <= FLT_EPSILON * fmaxf(fabsf(a), fabsf(b));
}

int main(void) {
    float a = 0.1f + 0.2f;
    float b = 0.3f;

    printf("Direct ==:     %s\n", (a == b) ? "equal" : "not equal");
    printf("Epsilon test:  %s\n", float_equal(a, b) ? "equal" : "not equal");
    return 0;
}

The phenomenon is representational error: most decimal fractions are not exactly representable in binary floating point. The related precision-loss problem is called catastrophic cancellation (lecture page 77–79). The lecture recommends rearranging expressions algebraically — e.g., replace (x*x - y*y) with (x-y)*(x+y) to reduce cancellation. Never use == to compare floating-point values; use an epsilon-based comparison instead.

P5 — Write a C program to determine if your system is big-endian or little-endian Week 2 Lecture — Endianness

Write a complete C program that programmatically determines the endianness of the machine it runs on. Do not use any platform-specific macros — use only standard C. Explain how the technique works.

#include <stdio.h>

/* Method 1: pointer cast to unsigned char* (C99+, technically
   implementation-defined but works universally) */
int is_little_endian_ptr(void) {
    unsigned int val = 1;
    unsigned char *p = (unsigned char *)&val;
    return *p == 1;  /* LSB at lowest address = little-endian */
}

/* Method 2: union (well-defined in C99+, preferred) */
int is_little_endian_union(void) {
    union { unsigned int u; unsigned char c[4]; } x;
    x.u = 1;
    return x.c[0] == 1;
}

int main(void) {
    if (is_little_endian_union()) {
        printf("Little-endian: LSB stored at lowest address.\n");
        printf("Example: 0x12345678 is stored as 78 56 34 12\n");
    } else {
        printf("Big-endian: MSB stored at lowest address.\n");
        printf("Example: 0x12345678 is stored as 12 34 56 78\n");
    }
    return 0;
}

The trick: store the value 1 as a 32-bit unsigned int. On a little-endian machine, the byte at the lowest address is the least significant byte — which is 0x01 for the value 1. On a big-endian machine the first byte is 0x00 (only the last byte is 0x01). The union method is preferred over the pointer cast because accessing a union member different from the one last stored is explicitly allowed in C99 and later (it performs type-punning via the common initial sequence rule).

P6 — What is the output? unsigned char c = 255; c++; printf("%d\n", c); Week 2 Lecture — Wrap-around / unsigned overflow

Trace through the following code and state the output. Explain the mechanism using binary arithmetic and the C standard's rules for unsigned overflow.

#include <stdio.h>
int main(void) {
    unsigned char c = 255;
    c++;
    printf("%d\n", c);
    return 0;
}

Output: 0

Explanation:
  unsigned char range: 0 to 255 (2^8 - 1)
  c = 255 in binary:   1111 1111
  c++:  1111 1111 + 0000 0001
                 = 1 0000 0000  (9 bits — carry out)
  Truncated to 8 bits: 0000 0000 = 0

The C standard guarantees this behaviour for UNSIGNED types:
"A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one
greater than the largest value that can be represented."
(C11 §6.2.5¶9)

So 255 + 1 ≡ 0 (mod 256). The output is 0.

Note: this would be UNDEFINED BEHAVIOUR if c were signed char,
because signed integer overflow is UB in C.

The output is 0. For unsigned types, overflow is well-defined wrap-around (modular arithmetic). unsigned char wraps modulo 256. This is the lecture's "meaning of numbers" concept: the bit pattern wraps around. The %d format specifier is fine here because unsigned char is promoted to int before being passed to printf, and the value 0 fits in int.

🃏 Flashcards

Key concepts to memorize

Card 1 of 10

Question — click to flip

Answer

Click card to flip • Use buttons to navigate

✅ Quiz

Test your understanding

Topic 32 Quiz — Number Systems & Data Representation Score: 0 / 4

1

What is sizeof(char) guaranteed to return in C?LO1

multiple choice

2

Fill in the blank: the hex value 0xFF equals ___ in decimal.LO8

fill in the blank

3

True or False: All floating-point numbers can be represented exactly in IEEE 754.LO8

true / false

4

On a little-endian system, the 32-bit value 0x12345678 is stored starting at address 0x100. What byte value is at address 0x100?LO8

multiple choice

0/4

Quiz complete!