title |
---|
Fragment |
A fragment metadata folder is called <timestamped_name>
` and located here:
my_array # array folder
| ...
|_ __fragments # array fragments folder
|_ <timestamped_name> # fragment folder
| |_ __fragment_metadata.tdb # fragment metadata
| |_ a0.tdb # fixed-sized attribute
| |_ a1.tdb # var-sized attribute (offsets)
| |_ a1_var.tdb # var-sized attribute (values)
| |_ a2.tdb # fixed-sized nullable attribute
| |_ a2_validity.tdb # fixed-sized nullable attribute (validities)
| |_ ...
| |_ d0.tdb # fixed-sized dimension
| |_ d1.tdb # var-sized dimension (offsets)
| |_ d1_var.tdb # var-sized dimension (values)
| |_ ...
| |_ t.tdb # timestamp attribute
| |_ ...
| |_ dt.tdb # delete timestamp attribute
| |_ ...
| |_ dci.tdb # delete condition index attribute
| |_ ...
|_ ...
There can be any number of fragments in an array. The fragment folder contains:
- A single fragment metadata file named
__fragment_metadata.tdb
. - Any number of data files. For each fixed-sized attribute
foo1
(or dimensionbar1
), there is a single data filea0.tdb
(d0.tdb
) containing the values along this attribute (dimension). For every var-sized attributefoo2
(or dimensionsbar2
), there are two data files;a1_var.tdb
(d1_var.tdb
) containing the var-sized values of the attribute (dimension) anda1.tdb
(d1.tdb
) containing the starting offsets of each value ina1_var.tdb
(d1_var.rdb
). Both fixed-sized and var-sized attributes can be nullable. A nullable attribute,foo3
, will have an additional filea2_validity.tdb
that contains its validity vector. - The names of the data files are not dependent on the names of the attributes/dimensions. The file names are determined by the order of the attributes and dimensions in the array schema.
- The timestamp fixed attribute (
t.tdb
) is, for fragments consolidated with timestamps, the time at which a cell was added. - The delete timestamp fixed attribute (
dt.tdb
) is, for fragments consolidated with delete conditions, the time at which a cell was deleted. - The delete condition Delete commit file index fixed attribute (
dci.tdb
) is, for fragments consolidated with delete conditions, the index of the delete condition (inside of Tile Processed Conditions) that deleted the cell.
The fragment metadata file has the following on-disk format:
Field | Type | Description |
---|---|---|
R-Tree | R-Tree | The serialized R-Tree |
Tile offsets for attribute/dimension 1 | Tile Offsets | The serialized on-disk tile offsets for attribute/dimension 1 |
… | … | … |
Tile offsets for attribute/dimension N | Tile Offsets | The serialized on-disk tile offsets for attribute/dimension N |
Variable tile offsets for attribute/dimension 1 | Tile Offsets | The serialized on-disk variable tile offsets for attribute/dimension 1 |
… | … | … |
Variable tile offsets for attribute/dimension N | Tile Offsets | The serialized on-disk variable tile offsets for attribute/dimension N |
Variable tile sizes for attribute/dimension 1 | Tile Sizes | The serialized in-memory variable tile sizes for attribute/dimension 1 |
… | … | … |
Variable tile sizes for attribute/dimension N | Tile Sizes | The serialized in-memory variable tile sizes for attribute/dimension N |
Validity tile offsets for attribute/dimension 1 | Tile Offsets | The serialized on-disk validity tile offsets for attribute/dimension 1 |
… | … | … |
Validity tile offsets for attribute/dimension N | Tile Offsets | The serialized on-disk validity tile offsets for attribute/dimension N |
Tile mins for attribute/dimension 1 | Tile Mins/Maxes | The serialized mins for attribute/dimension 1 |
… | … | … |
Variable mins for attribute/dimension N | Tile Mins/Maxes | The serialized mins for attribute/dimension N |
Tile maxes for attribute/dimension 1 | Tile Mins/Maxes | The serialized maxes for attribute/dimension 1 |
… | … | … |
Variable maxes for attribute/dimension N | Tile Mins/Maxes | The serialized maxes for attribute/dimension N |
Tile sums for attribute/dimension 1 | Tile Sums | The serialized sums for attribute/dimension 1 |
… | … | … |
Variable sums for attribute/dimension N | Tile Sums | The serialized sums for attribute/dimension N |
Tile null counts for attribute/dimension 1 | Tile Null Count | The serialized null counts for attribute/dimension 1 |
… | … | … |
Variable maxes for attribute/dimension N | Tile Null Count | The serialized null counts for attribute/dimension N |
Fragment min, max, sum, null count | Tile Fragment Min Max Sum Null Count | The serialized fragment min max sum null count |
Processed conditions | Tile Processed Conditions | The serialized processed conditions |
Metadata footer | Footer | Basic metadata gathered in the footer |
The R-Tree is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Fanout | uint32_t |
The tree fanout |
Num levels | uint32_t |
The number of levels in the tree |
Num MBRs at level 1 | uint64_t |
The number of MBRs at level 1 |
MBR 1 at level 1 | MBR | First MBR at level 1 |
… | … | … |
MBR N at level 1 | MBR | N-th MBR at level 1 |
… | … | … |
Num MBRs at level L | uint64_t |
The number of MBRs at level L |
MBR 1 at level L | MBR | First MBR at level L |
… | … | … |
MBR N at level L | MBR | N-th MBR at level L |
Each MBR entry has format:
Field | Type | Description |
---|---|---|
1D range for dimension 1 | 1DRange |
The 1-dimensional range for dimension 1 |
… | … | … |
1D range for dimension D | 1DRange |
The 1-dimensional range for dimension D |
For fixed-sized dimensions, the 1DRange
format is:
Field | Type | Description |
---|---|---|
Range minimum | uint8_t |
The minimum value with the same datatype as the dimension |
Range maximum | uint8_t |
The maximum value with the same datatype as the dimension |
For var-sized dimensions, the 1DRange
format is:
Field | Type | Description |
---|---|---|
Range length | uint64_t |
The number of bytes of the 1D range |
Minimum value length | uint64_t |
The number of bytes of the minimum value |
Range minimum | uint8_t |
The minimum (var-sized) value with the same datatype as the dimension |
Range maximum | uint8_t |
The maximum (var-sized) value with the same datatype as the dimension |
Tile offsets refer to each on-disk data tile's starting byte offset.
Tile offsets is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num tile offsets | uint64_t |
Number of tile offsets |
Tile offset 1 | uint64_t |
Offset 1 |
… | … | … |
Tile offset N | uint64_t |
Offset N |
The tile size refers to the in-memory size.
It is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num tile sizes | uint64_t |
Number of tile sizes |
Tile size 1 | uint64_t |
Size 1 |
… | … | … |
Tile size N | uint64_t |
Size N |
The tile mins maxes is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num values | uint64_t |
Number of values |
Value 1 | type |
Value 1 or Offset 1 |
… | … | … |
Value N | type |
Value N or Offset N |
Var buffer size | uint64_t |
Var buffer size |
Var buffer | uint8_t |
Var buffer |
The tile sums is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num values | uint64_t |
Number of values |
Value 1 | uint64_t |
Sum 1 |
… | … | … |
Value N | uint64_t |
Sum N |
The tile null count is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Num values | uint64_t |
Number of values |
Value 1 | uint64_t |
Count 1 |
… | … | … |
Value N | uint64_t |
Count N |
The fragment min max sum null count is a generic tile with the following internal format:
Field | Type | Description |
---|---|---|
Min size | uint64_t |
Size of the min value for attribute/dimension 1 |
Min value | uint8_t |
Buffer for min value for attribute/dimension 1 |
Max size | uint64_t |
Size of the max value for attribute/dimension 1 |
Max value | uint8_t |
Buffer for max value for attribute/dimension 1 |
Sum | uint64_t |
Sum value for attribute/dimension 1 |
Null count | uint64_t |
Null count value for attribute/dimension 1 |
… | … | … |
Min size | uint64_t |
Size of the min value for attribute/dimension N |
Min value | uint8_t |
Buffer for min value for attribute/dimension N |
Max size | uint64_t |
Size of the max value for attribute/dimension N |
Max value | uint8_t |
Buffer for max value for attribute/dimension N |
Sum | uint64_t |
Sum value for attribute/dimension N |
Null count | uint64_t |
Null count value for attribute/dimension N |
The processed conditions is a generic tile and is the list of delete/update conditions that have already been applied for this fragment and don't need to be applied again, sorted by filename, with the following internal format:
Field | Type | Description |
---|---|---|
Num | uint64_t |
Number of processed conditions |
Condition size | uint64_t |
Condition size 1 |
Condition | uint8_t |
Condition marker filename 1 |
… | … | … |
Condition size | uint64_t |
Condition size N |
Condition | uint8_t |
Condition marker filename N |
The footer is a simple blob (i.e., not a generic tile) with the following internal format:
Field | Type | Description |
---|---|---|
Version number | uint32_t |
Format version number of the fragment |
Array schema name size | uint64_t |
Size of the array schema name |
Array schema name | string |
Array schema name |
Dense | uint8_t |
Whether the array is dense (1) or not (0) |
Null non-empty domain | uint8_t |
Indicates whether the non-empty domain is null (1) or not (0) |
Non-empty domain | MBR | An MBR denoting the non-empty domain |
Number of sparse tiles | uint64_t |
Number of sparse tiles |
Last tile cell num | uint64_t |
For sparse arrays, the number of cells in the last tile in the fragment |
Includes timestamps | uint8_t |
Whether the fragment includes timestamps (1) or not (0) |
Includes delete metadata | uint8_t |
Whether the fragment includes delete metadata (1) or not (0) |
File sizes | uint64_t[] |
The size in bytes of each attribute/dimension file in the fragment. For var-length attributes/dimensions, this is the size of the offsets file. |
File var sizes | uint64_t[] |
The size in bytes of each var-length attribute/dimension file in the fragment. |
File validity sizes | uint64_t[] |
The size in bytes of each attribute/dimension validity vector file in the fragment. |
R-Tree offset | uint64_t |
The offset to the generic tile storing the R-Tree in the metadata file. |
Tile offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile offsets for attribute/dimension 1. |
… | … | … |
Tile offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile offsets for attribute/dimension N |
Tile var offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the variable tile offsets for attribute/dimension 1. |
… | … | … |
Tile var offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the variable tile offsets for attribute/dimension N. |
Tile var sizes offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the variable tile sizes for attribute/dimension 1. |
… | … | … |
Tile var sizes offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the variable tile sizes for attribute/dimension N. |
Tile validity offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile validity offsets for attribute/dimension 1. |
… | … | … |
Tile validity offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile validity offsets for attribute/dimension N |
Tile mins offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile mins for attribute/dimension 1. |
… | … | … |
Tile mins offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile mins for attribute/dimension N |
Tile maxes offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile maxes for attribute/dimension 1. |
… | … | … |
Tile maxes offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile maxes for attribute/dimension N |
Tile sums offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile sums for attribute/dimension 1. |
… | … | … |
Tile sums offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile sums for attribute/dimension N |
Tile null counts offset for attribute/dimension 1 | uint64_t |
The offset to the generic tile storing the tile null counts for attribute/dimension 1. |
… | … | … |
Tile null counts offset for attribute/dimension N | uint64_t |
The offset to the generic tile storing the tile null counts for attribute/dimension N |
Fragment min max sum null count offset | uint64_t |
The offset to the generic tile storing the fragment min max sum null count data. |
Processed conditions offset | uint64_t |
The offset to the generic tile storing the processed conditions. |
Array schema name size | uint64_t |
The total number of characters of the array schema name. |
Array schema name | uint8_t[] |
The array schema name. |
Footer length | uint64_t |
Sum of bytes of the above fields. Only present when there is at least one var-sized dimension. |
The on-disk format of each data file is:
Field | Type | Description |
---|---|---|
Tile 1 | Tile | The data of tile 1 |
… | … | … |
Tile N | Tile | The data of tile N |