Skip to content

Latest commit

 

History

History
193 lines (137 loc) · 7.52 KB

db_gen.md

File metadata and controls

193 lines (137 loc) · 7.52 KB
title description date author tags
Generating and extending a database
Introduction to the PyKX database creation and management functionality
July 2024
KX Systems, Inc.,
PyKX, q, database, maintenance, management, generation

Generate and extend a database

This page explains how to create and expand databases using PyKX.

!!! tip "Tip: For the best experience, we recommend reading Databases in PyKX first. If you already have access to a database and only need to load it, you can skip this page and jump right to load database."

Before leveraging the performance of PyKX when querying on-disk data, you need to create a persisted database. In the following sections we complete the following:

  1. Create a new database containing a single table #!python trade and multiple days of data.
  2. Add a new day worth of data for #!python today to the database for the #!python trade table.
  3. On-board a new table (#!python quote) which contains data from #!python today.
  4. Ensure that the new table is queryable.

!!! note "Bring your own data"

The below example makes use of randomly-generated data using PyKX, where we use `#!python trade` or `#!python quote` tables generated in that manner. You can replace them with an equivalent Pandas/PyArrow table which will be converted to a PyKX table before being persisted.

1. Create database

For more information on database structures, see the linked section on what is a database. With PyKX, use the #!python pykx.DB class for all database interactions in Python. This class lets you create, expand, and maintain on-disk splayed/partitioned databases. First, we need to create a database.

In the next cell, we create a #!python trade table with data from multiple days in the chat.

>>> import pykx as kx
>>> N = 10000000
>>> trade = kx.Table(data={
...     'date': kx.random.random(N, kx.DateAtom('today') - [1, 2, 3, 4]),
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0)
...     })

Now that we have generated our trade table, we can persist it to disk at the location #!python /tmp/db.

=== "Partitioned Database"

```python
>>> db = kx.DB(path='/tmp/db')
>>> db.create(trade, 'trade', 'date')
```

=== "Splayed Database"

```python
>>> db = kx.DB(path='/tmp/splay')
>>> db.create(trade, 'trade', format='splayed')
```

That's it, you now have a persisted database. To verify the availability of the database and its tables, we can examine the database object:

=== "Partitioned Database"

```python
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.PartitionedTable'>
```

=== "Splayed Database"

```python
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.SplayedTable'>
```

The above database persistence uses the default parameters within the #!python create function. If you need to compress/encrypt the tables persisted to the database or need to define a #!python by or specify the symbol enumeration name when persisting a Partitioned Database, you can follow the API documentation here.

2. Add new database partition

The following section outlines functionality only applicable to Partitioned Databases.

Now that you have generated a database, you can add extra partitions using the same database class and the #!python create function. In this example we will add new data for the current day created in the below cell and convert it to a Pandas DataFrame prior to persistence:

>>> N = 2000000
>>> trade = kx.Table(data={
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'price': kx.random.random(N, 10.0)
...     }).pd()
>>> type(trade)
pandas.core.frame.DataFrame

Note that in comparison to the original database creation logic, we do not have a #!python date column. Instead, we add a date at partition creation. Below we provide a variety of examples of adding new partitions under various conditions:

=== "Generate default partition"

```python
>>> db.create(trade, 'trade', kx.DateAtom('today'))
```

=== "Compress data in a partition"

In the below example, we compress data within the persisted partition using [`gzip`](https://en.wikipedia.org/wiki/Gzip). For further details on supported compression formats see [here](../compress-encrypt.md) or look at the API reference [here](../../../api/compress.md).

```python
>>> gzip = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(trade, 'trade', kx.DateAtom('today'), compress=gzip)
```

=== "Encrypt persisted data"

In the below example, we encrypt the data persisted for the added partition. For further details on how encryption works within PyKX see [here](../compress-encrypt.md) or look at the API reference [here](../../../api/compress.md).

```python
>>> encrypt = kx.Encrypt('/path/to/mykey.key', 'mySuperSecretPassword')
>>> db.create(trade, 'trade', kx.DateAtom('today'), encrypt=encrypt)
```

3. Add new table to database

After onboarding your first table to a database, a common question is “How can I add a new table of related data?”. You can use the #!python DB class and the #!python create function to do this. For instance, let’s add a #!python quote table for the current day:

>>> N = 1000000
>>> quote = kx.Table(data={
...     'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
...     'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
...     'ask': kx.random.random(N, 100),
...     'bid': kx.random.random(N, 100)
... })

We can now add this data to your database

=== "Partitioned Database"

For the current day we can add the `#!python quote` table and see that the table is defined:

```python
>>> db.create(quote, 'quote', kx.DateAtom('today'))
>>> db.tables
['quote', 'trade']
>>> type(db.quote)
<class 'pykx.wrappers.PartitionedTable'>
```

=== "Splayed Database"

Add the table `#!python quote` to the database

```python
>>> db.create(quote, 'quote', format='splayed')
>>> db.tables
['trade', 'quote']
>>> type(db.quote)
<class 'pykx.wrappers.SplayedTable'>
```

4. Ensure new table is queryable

The following section outlines a restriction only applicable to Partitioned Databases, Splayed Databases should be queryable immediately.

You have now persisted another table to your database, however, you will notice if you access the #!python quote table that the return is surprising:

>>> db.quote
pykx.PartitionedTable(pykx.q('+`time`sym`ask`bid!`quote'))

The reason for this is that you currently do not have data in each partition of your database for the #!python quote table. To rectify this, run the #!python fill_database method off the #!python database class which adds relevant empty quote data to tables to the partitions from which it's missing:

>>> db.fill_database()

Now you should be able to access the #!python quote data for query:

>>> db.quote

Next Steps