Skip to content

Latest commit

 

History

History
75 lines (48 loc) · 2.98 KB

File metadata and controls

75 lines (48 loc) · 2.98 KB

Tuple.java — Row Serialization & Deserialization

File: src/main/java/com/minipostgres/storage/Tuple.java


What Problem Does This Solve?

A tuple is a single row in a table. When you do INSERT INTO students VALUES (1, 'Alice', 3.9), internally that becomes a Tuple object with values [1, "Alice", 3.9f].

But disk pages store raw bytes, not Java objects. We need a way to:

  1. Serialize a Tuple → byte[] (to write to a page)
  2. Deserialize byte[] → Tuple (to read from a page)
  3. Calculate how many bytes a tuple will occupy (for page space checks)

Thought Process

Why Object[] for Values?

private final Object[] values;

Java generics don't work well with primitives. We could have separate int[], float[], String[] arrays, but that makes the code fragile — every new type means a new array. Using Object[] with autoboxing is simpler and extensible. The Schema tells us the type of each position, so we always know how to cast.

The Binary Format

Each field is serialized according to its type:

INT:     [4 bytes — big-endian integer]
FLOAT:   [4 bytes — big-endian IEEE 754]
VARCHAR: [4 bytes length prefix][N bytes UTF-8 data]

Why length-prefixed VARCHAR? Fixed-width strings waste space (storing "Al" in a VARCHAR(50) would waste 48 bytes). Length-prefixing uses only the bytes needed. The 4-byte prefix tells us exactly how many bytes to read.

Why big-endian? Java's ByteBuffer defaults to big-endian, and it makes debugging easier since bytes read left-to-right correspond to the numeric value. PostgreSQL also uses network byte order (big-endian) for wire protocols.

Serialization Example

For a tuple (42, "Alice", 3.9f) with schema (id INT, name VARCHAR(50), gpa FLOAT):

Bytes:    [00 00 00 2A] [00 00 00 05] [41 6C 69 63 65] [40 79 99 9A]
           ↑ INT 42      ↑ len=5       ↑ "Alice" UTF-8   ↑ FLOAT 3.9

Total: 4 + 4 + 5 + 4 = 17 bytes.

Why getSerializedSize()?

Before inserting a tuple into a page, we need to check if there's enough free space. This method calculates the exact byte count without actually serializing — though for simplicity we compute it by examining the values. This is called by the Page before deciding whether to accept the tuple.


How It Connects to Other Components

Tuple ◀──schema──── Schema       (knows its column types)
Tuple ──serialize──▶ Page        (stored as bytes in page slots)
Tuple ◀──deserialize── Page      (reconstructed when reading)
Tuple ──carried by──▶ HeapFile   (insert/get operations)
Tuple ──filtered by──▶ Predicate (WHERE clause evaluation)

Edge Cases Handled

  • Empty strings: VARCHAR "" serializes as [00 00 00 00] (length 0, no data bytes)
  • Unicode: UTF-8 encoding handles multi-byte characters (emoji, accented chars)
  • Value count mismatch: Constructor throws IllegalArgumentException if values don't match schema column count