Number Systems & Data Representation
How computers store every number, character, and colour as patterns of 1s and 0s — binary, hexadecimal, two's complement, IEEE 754, and endianness explained from the ground up.
Everything is just bits
A computer only understands two states: power on (1) and power off (0). That's it. Everything — your photos, this text, every number, every instruction — is just a pattern of 1s and 0s stored in transistors. The entire discipline of number systems in COMP2017 is about: how do we encode all the information we care about as patterns of those two symbols, and how do we efficiently read and write those patterns in C?
Binary (Base 2) — the native language of hardware
In everyday life we use decimal (base 10) — ten digits (0–9), where each position is a power of 10. Binary works the same way but with only two digits (0 and 1) and powers of 2. The key insight: position = power of the base.
| Position | 2⁷ | 2⁶ | 2⁵ | 2⁴ | 2³ | 2² | 2¹ | 2⁰ |
|---|---|---|---|---|---|---|---|---|
| Value | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
| 42 = | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
So 42 = 32 + 8 + 2 = 101010₂. To convert decimal to binary by hand, use repeated division by 2, collecting remainders from bottom to top: 42÷2=21 r0, 21÷2=10 r1, 10÷2=5 r0, 5÷2=2 r1, 2÷2=1 r0, 1÷2=0 r1 → read remainders bottom-up: 101010.
In C you cannot write int x = 0b101010; portably until C23, but you can always use hex or decimal. In Python, bin(42) gives '0b101010' instantly — in C you must understand the bits directly, because C gives you direct access to memory.
Hexadecimal (Base 16) — the compact shorthand
Binary is precise but verbose. Hexadecimal (base 16, digits 0–9 then A–F) is the programmer's shorthand because one hex digit maps exactly to 4 binary bits (a nibble). This means any byte (8 bits) is always exactly 2 hex digits.
| Decimal | Binary | Hex | Decimal | Binary | Hex |
|---|---|---|---|---|---|
| 0 | 0000 | 0 | 8 | 1000 | 8 |
| 1 | 0001 | 1 | 9 | 1001 | 9 |
| 4 | 0100 | 4 | 10 | 1010 | A |
| 7 | 0111 | 7 | 15 | 1111 | F |
In C, hex literals start with 0x: int mask = 0xFF; is 255. Octal literals start with a leading zero: int perm = 0777; — this is why writing int x = 08; is a compile error (8 is not a valid octal digit!). The printf format specifiers %x (lowercase) and %X (uppercase) print an integer as hex. %o prints octal.
A leading zero makes an integer literal octal. int x = 010; is 8, not 10. This silently compiles with no warning but gives wrong results. Always use 0x (hex) or no prefix (decimal) for clarity.
Two's Complement — storing negative integers
How does C store -1 as bits? It uses two's complement encoding. The rule is elegant: for an n-bit type, the most-significant bit (MSB) has weight −2n-1 instead of +2n-1. All other bits still have positive weights.
| Bit 7 (weight) | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 | Value |
|---|---|---|---|---|---|---|---|---|
| −128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | −1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | −128 |
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | +127 |
Why is two's complement clever? Because addition and subtraction work identically for signed and unsigned integers — the hardware only needs one adder circuit. The MSB is called the sign bit: 0 means non-negative, 1 means negative.
To negate a two's complement number manually: flip all bits, then add 1. For example, negate +5 (00000101): flip → 11111010, add 1 → 11111011 = −5. Verify: −128+64+32+16+8+0+2+1 = −5. ✓
Wrap-around: For an unsigned char (0–255), adding 1 to 255 wraps back to 0. For signed char (−128 to +127), adding 1 to 127 wraps to −128. In C, signed integer overflow is undefined behaviour — the compiler may assume it never happens and generate incorrect code. Unsigned overflow is well-defined and wraps modulo 2n.
In Python, integers have arbitrary precision — you never overflow. In C, int is typically 32 bits. Printing INT_MAX + 1 is undefined behaviour in C, but Python handles it seamlessly. This difference is a major source of security vulnerabilities in C programs.
IEEE 754 — floating-point numbers
How does C store 3.14? It uses the IEEE 754 standard. A float uses 32 bits and a double uses 64 bits, split into three fields:
| Sign (S) | Exponent (E) | Mantissa / Fraction (M) | Total |
|---|---|---|---|
| float: 1 bit | float: 8 bits | float: 23 bits | 32 bits |
| double: 1 bit | double: 11 bits | double: 52 bits | 64 bits |
The value is decoded as: (-1)^S × 1.M × 2^(E-127) for float (E-1023 for double). The mantissa implicitly starts with "1." (the hidden bit), which is why there are 24 effective bits of precision in a float despite only 23 being stored.
Why 0.1 + 0.2 != 0.3? Because 0.1 in binary is a repeating fraction: 0.0001100110011... (like 1/3 in decimal). IEEE 754 stores only finitely many bits, so 0.1 is stored as the closest representable value, which is not exactly 0.1. When you add two slightly-wrong approximations, you get a slightly-wrong result — not 0.3. This is not a bug in C or your compiler; it is an inherent property of binary floating point. You will see this in every language that uses IEEE 754 (Python, JavaScript, Java...).
Special values in IEEE 754: +Infinity (e.g. 1.0/0.0), -Infinity, and NaN (Not a Number, e.g. 0.0/0.0 or sqrt(-1)). These are represented by special exponent bit patterns.
Subtracting two nearly-equal floating-point numbers can lose most significant digits. Example: if x = 1.0000001 and y = 1.0000000, then x−y should be 1e-7, but in float precision the leading digits cancel and the result has only 1 correct bit. The lecture mentions this as catastrophic cancellation. The fix: algebraically rearrange expressions before coding them. For example, replace (x*x - y*y) with (x-y)*(x+y).
Endianness — byte order in multi-byte types
An int is 4 bytes. When stored in memory at address 0x100, which byte goes first? This depends on the endianness of the processor:
| System | Addr 0x100 | Addr 0x101 | Addr 0x102 | Addr 0x103 |
|---|---|---|---|---|
| Big-Endian | 0x12 (MSB) | 0x34 | 0x56 | 0x78 (LSB) |
| Little-Endian | 0x78 (LSB) | 0x56 | 0x34 | 0x12 (MSB) |
Big-endian: most-significant byte first (like writing a number left-to-right). Little-endian: least-significant byte first. Your x86 or x86-64 PC (and most ARM phones) is little-endian. TCP/IP network byte order is big-endian — this is why functions like htons() ("host to network short") exist: they swap bytes when moving data from your little-endian CPU to the network.
You can detect endianness in C using a union trick: store an integer and read back individual bytes. The lecture shows hexdump -X (big-endian display) vs hexdump -x (little-endian display) producing different byte orderings for the same data on an x86 machine.
Endianness only matters when: (1) writing binary data to files or the network that will be read by another machine; (2) casting pointers between integer types of different sizes; (3) using union tricks to inspect raw bit patterns.
The C syntax you need to know
Format specifiers for numeric bases
Integer literal prefixes
Bit-manipulation operators (quick reference)
~0 gives all-ones (−1 in signed, UINT_MAX in unsigned).
Union trick to inspect float bits
The union in C lets multiple members share the same memory. This is the standard way to reinterpret a float's raw bit pattern as an integer without invoking undefined behaviour (unlike direct pointer casts).
#include <stdio.h>
/* Union: both members share the same 4 bytes */
union FloatBits {
float f; /* interpret as float */
unsigned int u; /* interpret as raw 32-bit pattern */
};
int main(void) {
union FloatBits fb;
fb.f = 1.0f;
/* Print the raw bits of 1.0 in IEEE 754 */
printf("float 1.0 bits: 0x%08X\n", fb.u);
/* Expected: 0x3F800000
Sign=0, Exponent=0x7F (127, biased), Mantissa=0 (hidden 1. prefix)
Value = 1.0 * 2^(127-127) = 1.0 * 1 = 1.0 */
fb.f = -0.5f;
printf("float -0.5 bits: 0x%08X\n", fb.u);
/* Sign=1, Exponent=0x7E (126), Mantissa=0
Value = -1.0 * 2^(126-127) = -1.0 * 0.5 = -0.5 */
return 0;
}
float -0.5 bits: 0xBF000000
Step-by-step walkthroughs
/*
* STEP 1: Decimal to Binary via repeated division by 2
*
* 42 / 2 = 21 remainder 0 (LSB — written last, read first)
* 21 / 2 = 10 remainder 1
* 10 / 2 = 5 remainder 0
* 5 / 2 = 2 remainder 1
* 2 / 2 = 1 remainder 0
* 1 / 2 = 0 remainder 1 (MSB — written first, read last)
*
* Reading remainders bottom to top: 1 0 1 0 1 0
* 42 (decimal) = 101010 (binary) = 00101010 (as 8-bit byte)
*
* STEP 2: Binary to Hex — group bits into nibbles (sets of 4) from right
*
* Binary: 0010 1010
* Hex digit: 2 A
*
* 42 (decimal) = 0x2A (hex)
*
* STEP 3: Verify with C
*/
#include <stdio.h>
int main(void) {
int n = 42;
printf("Decimal : %d\n", n);
printf("Hex : 0x%X\n", n); /* 0x2A */
printf("Octal : 0%o\n", n); /* 052 */
return 0;
}
Hex : 0x2A
Octal : 052
/*
* 0xF6: two hex digits — F and 6
*
* F = 1111 (binary), 6 = 0110 (binary)
* Concatenate nibbles: 1111 0110 (binary)
*
* Convert 11110110 to decimal:
* 1*128 + 1*64 + 1*32 + 1*16 + 0*8 + 1*4 + 1*2 + 0*1
* = 128 + 64 + 32 + 16 + 0 + 4 + 2 + 0
* = 246
*
* As signed int8_t: sign bit is 1, so it's negative.
* Two's complement value = -128 + 64 + 32 + 16 + 4 + 2 = -10
*/
#include <stdio.h>
#include <stdint.h>
int main(void) {
unsigned char u = 0xF6;
int8_t s = (int8_t)0xF6;
printf("0xF6 as unsigned char: %u\n", u); /* 246 */
printf("0xF6 as int8_t: %d\n", s); /* -10 */
/* Verify binary representation manually: */
/* bit 7: %d */
for (int i = 7; i >= 0; i--) {
printf("%d", (u >> i) & 1);
}
printf("\n"); /* 11110110 */
return 0;
}
0xF6 as int8_t: -10
11110110
#include <stdio.h>
int main(void) {
float a = 0.1f, b = 0.2f, c = 0.3f;
double da = 0.1, db = 0.2, dc = 0.3;
/* Standard print — looks fine */
printf("float: 0.1 + 0.2 = %.1f\n", a + b); /* 0.3 */
printf("double: 0.1 + 0.2 = %.1f\n", da + db); /* 0.3 */
/* High precision — the truth is revealed */
printf("\nfloat 0.1 stored as: %.20f\n", a);
printf("float 0.2 stored as: %.20f\n", b);
printf("float 0.3 stored as: %.20f\n", c);
printf("float 0.1+0.2 = %.20f\n", a + b);
printf("\ndouble 0.1 stored as: %.20f\n", da);
printf("double 0.2 stored as: %.20f\n", db);
printf("double 0.3 stored as: %.20f\n", dc);
printf("double 0.1+0.2 = %.20f\n", da + db);
/* Equality tests */
printf("\n(float) 0.1+0.2 == 0.3: %s\n", (a+b == c) ? "true" : "false");
printf("(double) 0.1+0.2 == 0.3: %s\n", (da+db == dc) ? "true" : "false");
return 0;
}
float 0.2 stored as: 0.20000000298023223877
float 0.3 stored as: 0.30000001192092895508
float 0.1+0.2 = 0.30000001192092895508
double 0.1 stored as: 0.10000000000000000555
double 0.2 stored as: 0.20000000000000001110
(float) 0.1+0.2 == 0.3: true (lucky: same rounding)
(double) 0.1+0.2 == 0.3: false
#include <stdio.h>
int main(void) {
/* Store a known 32-bit value */
union {
unsigned int val;
unsigned char bytes[4];
} u;
u.val = 0x12345678;
/* On a little-endian machine, the LOWEST address holds the LSB (0x78).
On a big-endian machine, the LOWEST address holds the MSB (0x12). */
printf("bytes[0] (lowest address) = 0x%02X\n", u.bytes[0]);
printf("bytes[1] = 0x%02X\n", u.bytes[1]);
printf("bytes[2] = 0x%02X\n", u.bytes[2]);
printf("bytes[3] (highest address)= 0x%02X\n", u.bytes[3]);
if (u.bytes[0] == 0x78) {
printf("\nThis machine is LITTLE-ENDIAN (x86/x64/ARM typical).\n");
} else if (u.bytes[0] == 0x12) {
printf("\nThis machine is BIG-ENDIAN.\n");
} else {
printf("\nUnexpected byte order — mixed endianness?\n");
}
return 0;
}
bytes[1] = 0x56
bytes[2] = 0x34
bytes[3] (highest address)= 0x12
This machine is LITTLE-ENDIAN (x86/x64/ARM typical).
Practice problems
Without using a calculator or C, convert the decimal value 73 to binary (show the divide-by-2 working) and then to hexadecimal (show the nibble grouping). Verify the hex by computing F×16+? .
Divide-by-2 method:
73 / 2 = 36 r 1 ← LSB
36 / 2 = 18 r 0
18 / 2 = 9 r 0
9 / 2 = 4 r 1
4 / 2 = 2 r 0
2 / 2 = 1 r 0
1 / 2 = 0 r 1 ← MSB
Read remainders bottom-up: 1 0 0 1 0 0 1
Padded to 8 bits: 0100 1001
Nibble grouping (right to left):
1001 = 9
0100 = 4
Hex: 0x49
Verify: 4 * 16 + 9 = 64 + 9 = 73 ✓
01001001₂ and in hex is 0x49. The nibble trick works because 16 = 2⁴, so each hex digit captures exactly 4 bits. Always group from the right (LSB side) to avoid leading-zero errors.
0x1A + 0b00110011 in decimal?
Wk3A Tutorial
Evaluate 0x1A + 0b00110011 in decimal. Show intermediate conversions. Note: 0b binary literals are a GCC extension / C23 feature.
0x1A:
1 * 16 + A * 1 = 16 + 10 = 26
0b00110011 (binary):
Position: 7 6 5 4 3 2 1 0
Bit: 0 0 1 1 0 0 1 1
Value: 32 +16 +2 +1 = 51
Sum: 26 + 51 = 77
printf("%d\n", 0x1A + 0b00110011); would print 77 on a C23-compatible compiler.
int x = -1; printf("%u\n", x); print and why?
Week 2 Lecture — Integer representations
On a 32-bit or 64-bit system, what does the following print? Explain using two's complement and the difference between %d and %u.
#include <stdio.h>
int main(void) {
int x = -1;
printf("%u\n", x);
return 0;
}
Output: 4294967295
Explanation:
-1 in two's complement 32-bit is all 1 bits:
1111 1111 1111 1111 1111 1111 1111 1111
%u tells printf to INTERPRET those same bits as an unsigned int.
All bits set in a 32-bit unsigned int = 2^32 - 1 = 4294967295 = UINT_MAX.
The BITS in memory are unchanged — only how printf interprets them changes.
%d: treats MSB as sign bit (-1)
%u: treats all bits as magnitude (4294967295)
0xFF...FF means −1 as a signed integer and UINT_MAX as an unsigned integer — same bits, different interpretation. Using the wrong format specifier is technically undefined behaviour in C, but on every real platform it produces this result.
(float)0.1 + (float)0.2 != (float)0.3? What is this called?
Week 2 Lecture — IEEE 754 / Catastrophic Cancellation
Explain in precise technical terms why the expression (float)0.1 + (float)0.2 != (float)0.3 may evaluate to true in C. What is the name for related precision-loss problems with floating-point subtraction? How should you compare floating-point numbers for equality in C?
/* Why it happens:
0.1 in binary = 0.0001100110011... (repeating, like 1/3 in decimal)
IEEE 754 float has only 23 mantissa bits — it must ROUND to the
nearest representable value.
Stored 0.1f ≈ 0.10000000149...
Stored 0.2f ≈ 0.20000000298...
Their sum ≈ 0.30000000447...
Stored 0.3f ≈ 0.30000001192...
These are different bit patterns → == returns 0 (false).
The related problem is called CATASTROPHIC CANCELLATION:
when two nearly-equal floats are subtracted, leading digits cancel
and the result has very few correct significant bits.
The correct way to compare floats: use an epsilon tolerance */
#include <stdio.h>
#include <math.h>
#include <float.h>
int float_equal(float a, float b) {
return fabsf(a - b) <= FLT_EPSILON * fmaxf(fabsf(a), fabsf(b));
}
int main(void) {
float a = 0.1f + 0.2f;
float b = 0.3f;
printf("Direct ==: %s\n", (a == b) ? "equal" : "not equal");
printf("Epsilon test: %s\n", float_equal(a, b) ? "equal" : "not equal");
return 0;
}
(x*x - y*y) with (x-y)*(x+y) to reduce cancellation. Never use == to compare floating-point values; use an epsilon-based comparison instead.
Write a complete C program that programmatically determines the endianness of the machine it runs on. Do not use any platform-specific macros — use only standard C. Explain how the technique works.
#include <stdio.h>
/* Method 1: pointer cast to unsigned char* (C99+, technically
implementation-defined but works universally) */
int is_little_endian_ptr(void) {
unsigned int val = 1;
unsigned char *p = (unsigned char *)&val;
return *p == 1; /* LSB at lowest address = little-endian */
}
/* Method 2: union (well-defined in C99+, preferred) */
int is_little_endian_union(void) {
union { unsigned int u; unsigned char c[4]; } x;
x.u = 1;
return x.c[0] == 1;
}
int main(void) {
if (is_little_endian_union()) {
printf("Little-endian: LSB stored at lowest address.\n");
printf("Example: 0x12345678 is stored as 78 56 34 12\n");
} else {
printf("Big-endian: MSB stored at lowest address.\n");
printf("Example: 0x12345678 is stored as 12 34 56 78\n");
}
return 0;
}
1 as a 32-bit unsigned int. On a little-endian machine, the byte at the lowest address is the least significant byte — which is 0x01 for the value 1. On a big-endian machine the first byte is 0x00 (only the last byte is 0x01). The union method is preferred over the pointer cast because accessing a union member different from the one last stored is explicitly allowed in C99 and later (it performs type-punning via the common initial sequence rule).
unsigned char c = 255; c++; printf("%d\n", c);
Week 2 Lecture — Wrap-around / unsigned overflow
Trace through the following code and state the output. Explain the mechanism using binary arithmetic and the C standard's rules for unsigned overflow.
#include <stdio.h>
int main(void) {
unsigned char c = 255;
c++;
printf("%d\n", c);
return 0;
}
Output: 0
Explanation:
unsigned char range: 0 to 255 (2^8 - 1)
c = 255 in binary: 1111 1111
c++: 1111 1111 + 0000 0001
= 1 0000 0000 (9 bits — carry out)
Truncated to 8 bits: 0000 0000 = 0
The C standard guarantees this behaviour for UNSIGNED types:
"A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one
greater than the largest value that can be represented."
(C11 §6.2.5¶9)
So 255 + 1 ≡ 0 (mod 256). The output is 0.
Note: this would be UNDEFINED BEHAVIOUR if c were signed char,
because signed integer overflow is UB in C.
unsigned char wraps modulo 256. This is the lecture's "meaning of numbers" concept: the bit pattern wraps around. The %d format specifier is fine here because unsigned char is promoted to int before being passed to printf, and the value 0 fits in int.
Key concepts to memorize
Test your understanding
sizeof(char) guaranteed to return in C?LO10xFF equals ___ in decimal.LO80x12345678 is stored starting at address 0x100. What byte value is at address 0x100?LO8