File: src/main/java/com/minipostgres/storage/Tuple.java
A tuple is a single row in a table. When you do INSERT INTO students VALUES (1, 'Alice', 3.9), internally that becomes a Tuple object with values [1, "Alice", 3.9f].
But disk pages store raw bytes, not Java objects. We need a way to:
- Serialize a Tuple →
byte[](to write to a page) - Deserialize
byte[]→ Tuple (to read from a page) - Calculate how many bytes a tuple will occupy (for page space checks)
private final Object[] values;Java generics don't work well with primitives. We could have separate int[], float[], String[] arrays, but that makes the code fragile — every new type means a new array. Using Object[] with autoboxing is simpler and extensible. The Schema tells us the type of each position, so we always know how to cast.
Each field is serialized according to its type:
INT: [4 bytes — big-endian integer]
FLOAT: [4 bytes — big-endian IEEE 754]
VARCHAR: [4 bytes length prefix][N bytes UTF-8 data]
Why length-prefixed VARCHAR? Fixed-width strings waste space (storing "Al" in a VARCHAR(50) would waste 48 bytes). Length-prefixing uses only the bytes needed. The 4-byte prefix tells us exactly how many bytes to read.
Why big-endian? Java's ByteBuffer defaults to big-endian, and it makes debugging easier since bytes read left-to-right correspond to the numeric value. PostgreSQL also uses network byte order (big-endian) for wire protocols.
For a tuple (42, "Alice", 3.9f) with schema (id INT, name VARCHAR(50), gpa FLOAT):
Bytes: [00 00 00 2A] [00 00 00 05] [41 6C 69 63 65] [40 79 99 9A]
↑ INT 42 ↑ len=5 ↑ "Alice" UTF-8 ↑ FLOAT 3.9
Total: 4 + 4 + 5 + 4 = 17 bytes.
Before inserting a tuple into a page, we need to check if there's enough free space. This method calculates the exact byte count without actually serializing — though for simplicity we compute it by examining the values. This is called by the Page before deciding whether to accept the tuple.
Tuple ◀──schema──── Schema (knows its column types)
Tuple ──serialize──▶ Page (stored as bytes in page slots)
Tuple ◀──deserialize── Page (reconstructed when reading)
Tuple ──carried by──▶ HeapFile (insert/get operations)
Tuple ──filtered by──▶ Predicate (WHERE clause evaluation)
- Empty strings: VARCHAR
""serializes as[00 00 00 00](length 0, no data bytes) - Unicode: UTF-8 encoding handles multi-byte characters (emoji, accented chars)
- Value count mismatch: Constructor throws
IllegalArgumentExceptionif values don't match schema column count