title | description | date | author | tags |
---|---|---|---|---|
Generating and extending a database |
Introduction to the PyKX database creation and management functionality |
July 2024 |
KX Systems, Inc., |
PyKX, q, database, maintenance, management, generation |
This page explains how to create and expand databases using PyKX.
!!! tip "Tip: For the best experience, we recommend reading Databases in PyKX first. If you already have access to a database and only need to load it, you can skip this page and jump right to load database."
Before leveraging the performance of PyKX when querying on-disk data, you need to create a persisted database. In the following sections we complete the following:
- Create a new database containing a single table
#!python trade
and multiple days of data. - Add a new day worth of data for
#!python today
to the database for the#!python trade
table. - On-board a new table (
#!python quote
) which contains data from#!python today
. - Ensure that the new table is queryable.
!!! note "Bring your own data"
The below example makes use of randomly-generated data using PyKX, where we use `#!python trade` or `#!python quote` tables generated in that manner. You can replace them with an equivalent Pandas/PyArrow table which will be converted to a PyKX table before being persisted.
For more information on database structures, see the linked section on what is a database. With PyKX, use the #!python pykx.DB
class for all database interactions in Python. This class lets you create, expand, and maintain on-disk splayed/partitioned databases. First, we need to create a database.
In the next cell, we create a #!python trade
table with data from multiple days in the chat.
>>> import pykx as kx
>>> N = 10000000
>>> trade = kx.Table(data={
... 'date': kx.random.random(N, kx.DateAtom('today') - [1, 2, 3, 4]),
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0)
... })
Now that we have generated our trade table, we can persist it to disk at the location #!python /tmp/db
.
=== "Partitioned Database"
```python
>>> db = kx.DB(path='/tmp/db')
>>> db.create(trade, 'trade', 'date')
```
=== "Splayed Database"
```python
>>> db = kx.DB(path='/tmp/splay')
>>> db.create(trade, 'trade', format='splayed')
```
That's it, you now have a persisted database. To verify the availability of the database and its tables, we can examine the database object:
=== "Partitioned Database"
```python
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.PartitionedTable'>
```
=== "Splayed Database"
```python
>>> db.tables
['trade']
>>> type(db.trade)
<class 'pykx.wrappers.SplayedTable'>
```
The above database persistence uses the default parameters within the #!python create
function. If you need to compress/encrypt the tables persisted to the database or need to define a #!python by
or specify the symbol enumeration name when persisting a Partitioned Database, you can follow the API documentation here.
The following section outlines functionality only applicable to Partitioned Databases.
Now that you have generated a database, you can add extra partitions using the same database class and the #!python create
function. In this example we will add new data for the current day created in the below cell and convert it to a Pandas DataFrame prior to persistence:
>>> N = 2000000
>>> trade = kx.Table(data={
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'price': kx.random.random(N, 10.0)
... }).pd()
>>> type(trade)
pandas.core.frame.DataFrame
Note that in comparison to the original database creation logic, we do not have a #!python date
column. Instead, we add a date at partition creation. Below we provide a variety of examples of adding new partitions under various conditions:
=== "Generate default partition"
```python
>>> db.create(trade, 'trade', kx.DateAtom('today'))
```
=== "Compress data in a partition"
In the below example, we compress data within the persisted partition using [`gzip`](https://en.wikipedia.org/wiki/Gzip). For further details on supported compression formats see [here](../compress-encrypt.md) or look at the API reference [here](../../../api/compress.md).
```python
>>> gzip = kx.Compress(kx.CompressionAlgorithm.gzip, level=2)
>>> db.create(trade, 'trade', kx.DateAtom('today'), compress=gzip)
```
=== "Encrypt persisted data"
In the below example, we encrypt the data persisted for the added partition. For further details on how encryption works within PyKX see [here](../compress-encrypt.md) or look at the API reference [here](../../../api/compress.md).
```python
>>> encrypt = kx.Encrypt('/path/to/mykey.key', 'mySuperSecretPassword')
>>> db.create(trade, 'trade', kx.DateAtom('today'), encrypt=encrypt)
```
After onboarding your first table to a database, a common question is “How can I add a new table of related data?”. You can use the #!python DB
class and the #!python create
function to do this. For instance, let’s add a #!python quote
table for the current day:
>>> N = 1000000
>>> quote = kx.Table(data={
... 'time': kx.q.asc(kx.random.random(N, kx.q('1D'))),
... 'sym': kx.random.random(N, ['AAPL', 'GOOG', 'MSFT']),
... 'ask': kx.random.random(N, 100),
... 'bid': kx.random.random(N, 100)
... })
We can now add this data to your database
=== "Partitioned Database"
For the current day we can add the `#!python quote` table and see that the table is defined:
```python
>>> db.create(quote, 'quote', kx.DateAtom('today'))
>>> db.tables
['quote', 'trade']
>>> type(db.quote)
<class 'pykx.wrappers.PartitionedTable'>
```
=== "Splayed Database"
Add the table `#!python quote` to the database
```python
>>> db.create(quote, 'quote', format='splayed')
>>> db.tables
['trade', 'quote']
>>> type(db.quote)
<class 'pykx.wrappers.SplayedTable'>
```
The following section outlines a restriction only applicable to Partitioned Databases, Splayed Databases should be queryable immediately.
You have now persisted another table to your database, however, you will notice if you access the #!python quote
table that the return is surprising:
>>> db.quote
pykx.PartitionedTable(pykx.q('+`time`sym`ask`bid!`quote'))
The reason for this is that you currently do not have data in each partition of your database for the #!python quote
table. To rectify this, run the #!python fill_database
method off the #!python database
class which adds relevant empty quote data to tables to the partitions from which it's missing:
>>> db.fill_database()
Now you should be able to access the #!python quote
data for query:
>>> db.quote
- Load an existing database.
- Modify the contents of your database
- Query your database with Python
- Compress/encrypt data for persisting database partitions.