Tuple.java — Row Serialization & Deserialization

File: src/main/java/com/minipostgres/storage/Tuple.java

What Problem Does This Solve?

A tuple is a single row in a table. When you do INSERT INTO students VALUES (1, 'Alice', 3.9), internally that becomes a Tuple object with values [1, "Alice", 3.9f].

But disk pages store raw bytes, not Java objects. We need a way to:

Serialize a Tuple → byte[] (to write to a page)
Deserialize byte[] → Tuple (to read from a page)
Calculate how many bytes a tuple will occupy (for page space checks)

Thought Process

Why `Object[]` for Values?

private final Object[] values;

Java generics don't work well with primitives. We could have separate int[], float[], String[] arrays, but that makes the code fragile — every new type means a new array. Using Object[] with autoboxing is simpler and extensible. The Schema tells us the type of each position, so we always know how to cast.

The Binary Format

Each field is serialized according to its type:

INT:     [4 bytes — big-endian integer]
FLOAT:   [4 bytes — big-endian IEEE 754]
VARCHAR: [4 bytes length prefix][N bytes UTF-8 data]

Why length-prefixed VARCHAR? Fixed-width strings waste space (storing "Al" in a VARCHAR(50) would waste 48 bytes). Length-prefixing uses only the bytes needed. The 4-byte prefix tells us exactly how many bytes to read.

Why big-endian? Java's ByteBuffer defaults to big-endian, and it makes debugging easier since bytes read left-to-right correspond to the numeric value. PostgreSQL also uses network byte order (big-endian) for wire protocols.

Serialization Example

For a tuple (42, "Alice", 3.9f) with schema (id INT, name VARCHAR(50), gpa FLOAT):

Bytes:    [00 00 00 2A] [00 00 00 05] [41 6C 69 63 65] [40 79 99 9A]
           ↑ INT 42      ↑ len=5       ↑ "Alice" UTF-8   ↑ FLOAT 3.9

Total: 4 + 4 + 5 + 4 = 17 bytes.

Why `getSerializedSize()`?

Before inserting a tuple into a page, we need to check if there's enough free space. This method calculates the exact byte count without actually serializing — though for simplicity we compute it by examining the values. This is called by the Page before deciding whether to accept the tuple.

How It Connects to Other Components

Tuple ◀──schema──── Schema       (knows its column types)
Tuple ──serialize──▶ Page        (stored as bytes in page slots)
Tuple ◀──deserialize── Page      (reconstructed when reading)
Tuple ──carried by──▶ HeapFile   (insert/get operations)
Tuple ──filtered by──▶ Predicate (WHERE clause evaluation)

Edge Cases Handled

Empty strings: VARCHAR "" serializes as [00 00 00 00] (length 0, no data bytes)
Unicode: UTF-8 encoding handles multi-byte characters (emoji, accented chars)
Value count mismatch: Constructor throws IllegalArgumentException if values don't match schema column count

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuple.java — Row Serialization & Deserialization

What Problem Does This Solve?

Thought Process

Why `Object[]` for Values?

The Binary Format

Serialization Example

Why `getSerializedSize()`?

How It Connects to Other Components

Edge Cases Handled

FilesExpand file tree

02-tuple.md

Latest commit

History

02-tuple.md

File metadata and controls

Tuple.java — Row Serialization & Deserialization

What Problem Does This Solve?

Thought Process

Why Object[] for Values?

The Binary Format

Serialization Example

Why getSerializedSize()?

How It Connects to Other Components

Edge Cases Handled

Why `Object[]` for Values?

Why `getSerializedSize()`?