Skip to content

The Many Types of Vector Bundles

Paul Rogers edited this page Apr 7, 2017 · 3 revisions

Drill is a columnar query engine. The column, in the form of a value vector, is the unit of storage. Value vectors are generated from templates.

While value vectors are the core, queries work with rows: SQL is a query language for relations, defined as tables of rows. To represent rows, Drill must "bundle" vectors together. This requires a bit of explanation. In a row-based system, one can easily point out a single row:

...|field n][field 1|field 2|field 3|...|field n][field 1|...

The middle bit is a whole row (AKA record), surrounded on either side by the previous and next rows. In a columnar system, it is impossible to point to one thing and say, "AHA! There is a row!". Instead, rows are an emergent concept.

Let's start with a single vector. To keep things simple, let's take a required integer vector, something that is represented with a single buffer:

|  10 |
|  20 | <-- Second row
|  30 |
...
| 100 |

If our data set consisted of a single column, then the above represents a set of 10 rows. The arrow above points to the second row. We can also see that this is the second position in the array of integers that make up the vector.

Clone this wiki locally