Skip to content

TileDB queries

SarantopoulosKon edited this page Jul 30, 2021 · 4 revisions

TileDB uses capnproto serialization, in order to enable TileDB queries from the browser we should serialize data to capnproto with the help of capnp-ts, a plugin that compiles capnproto schema files to typescript.

After compiling the Query schema, we can create our own functions to serialize and deseriliaze the query object.

REST call to get the Query

To query an array, the user should provide the ranges and the layout. We can create a query object from the user data and serialize it to capnproto in order to make the request.

After serializing a Query object and making a call to /v2/arrays/{namespace}/{array}/query/submit we get a response buffer back which contains:

  1. First 8 bytes (uint64) number equal to the size of the response.
  2. Query object (serialized in capnproto), which will give us information regarding the attributes.
  3. Raw buffers with all the attribute results.

Fig.1 - Representation of the response from the server

Skipping the first 8 bytes, we can deserialize the rest of the buffer and get back the Query object. The query object provides information regarding the sizes of the attributes at the end of the buffer. Inspecting the attributeBufferHeaders we can see there are 2 fixed-length attributes, a5 is 8 bytes long and a4 is 12.

// attributeBufferHeaders from the deserialized Query object
[
    {
      name: "a5",
      fixedLenBufferSizeInBytes: 8,
      varLenBufferSizeInBytes: 0,
      validityLenBufferSizeInBytes: 0,
      originalFixedLenBufferSizeInBytes: 16,
      originalVarLenBufferSizeInBytes: 0,
      originalValidityLenBufferSizeInBytes: 4,
    },
    {
      name: "a4",
      fixedLenBufferSizeInBytes: 12,
      varLenBufferSizeInBytes: 0,
      validityLenBufferSizeInBytes: 0,
      originalFixedLenBufferSizeInBytes: 96,
      originalVarLenBufferSizeInBytes: 128,
      originalValidityLenBufferSizeInBytes: 0,
    },
]

All we have to do is to make another call to /v1/arrays/{namespace}/{array} to get the attribute types from the Arrayschema.

// attributes in arraySchema
[{
    cellValNum: 4294967295,
    name: "a4",
    type: "INT32",
    filterPipeline: {},
    fillValue: [0, 0, 0, 128],
    nullable: false,
    fillValueValidity:false,
  },
  {
    cellValNum: 1,
    name: "a5",
    type: "INT32",
    filterPipeline: {},
    fillValue: [0, 0, 0, 128],
    nullable: false,
    fillValueValidity: false,
  }]

Having both the sizes and the types of the attributes in the buffer, we can start from the end of the buffer and start iterating the attributes from last to first to get their values.

Fixed-length Attributes

For fixed-length attributes we can slice the buffer (fixedLenBufferSizeInBytes in the attributeBufferheaders is the number of bytes of the attribute) and get the values depending on their types (e.g. if attribute has type INT32 we can use an Int32Array typed array to get the attribute’s value).

Var-length Attributes

For var-length attributes the first N bytes (in the case of a1 bellow 32) contains a Uint64 array with the offset bytes of the attribute. For a1 the first 32 bytes contain 4 uint64 offset bytes. Offset bytes is the offsets * BYTES_PER_ELEMENT, If a1 has type INT32 (4 bytes per element) and offsets [0, 3, 5] the offset bytes will be [0, 12, 20]. The next N bytes (where N is equal to varLenBufferSizeInBytes from the attributeBufferHeaders) contain the cell values of the attribute. If a1 has type INT32 we can use an Int32Array to get the 3 values contained inside the buffer.

// Example of a var-length attribute
{
    name: "a1",
    fixedLenBufferSizeInBytes: 32,
    varLenBufferSizeInBytes: 12,
    validityLenBufferSizeInBytes: 0,
    originalFixedLenBufferSizeInBytes: 96,
    originalVarLenBufferSizeInBytes: 9,
    originalValidityLenBufferSizeInBytes: 0,
},

Fig.2 - The first N bytes of a var-length attribute will be a Uint64 array with the offsets

Nullable Attributes

For nullable attributes the last N bytes (where N is equal to validityLenBufferSizeInBytes from the validityLenBufferSizeInBytes) is a Uint8 array of zeros and ones, where zeros represent that the value is null in that index. (e.g. if the cell values of an attribute is [12, 22, 33, 44] and the validity buffer [0, 1, 1, 0] we need to map the values to [NULL, 22, 33, NULL]

// Example of a var-length nullable attribute
{
    name: "a6",
    fixedLenBufferSizeInBytes: 32,
    varLenBufferSizeInBytes: 32,
    validityLenBufferSizeInBytes: 8,
    originalFixedLenBufferSizeInBytes: 32,
    originalVarLenBufferSizeInBytes: 32,
    originalValidityLenBufferSizeInBytes: 8,
}

Fig.3 - The last N bytes (N equal to validityLenBufferSizeInBytes from attributeBufferHeader) of a nullable attribute is a Uint8 array with zeros & ones (zeros representing nulls)

Clone this wiki locally