title | description | date | author | tags |
---|---|---|---|---|
Databases in PyKX |
PyKX database creation and management |
July 2024 |
KX Systems, Inc., |
PyKX, q, database, maintenance, management, generation |
This page explains the concept of databases in PyKX, including the creation and management of databases.
In PyKX, the term database refers to a kdb+ database which can hold a set of splayed and partitioned tables.
A splayed kdb+ database consists of a single table stored on-disk with each column stored as a separate file rather than using a single file for the whole table. Tables of medium-size with < 100 million rows and many columns are good candidates for being stored as splayed tables, in particular when only a small subset of columns are being accessed often.
quotes
├── .d
├── price
├── sym
└── time
!!! note "More information on splayed databases"
The splayed database format used by PyKX has been used in production environments for decades. As such there is a significant amount of information available on the creation and use of these databases. Below are some articles.
- [q knowledge base splayed databases](https://code.kx.com/q/kb/splayed-tables/)
- [Q for Mortals splayed tables](https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#142-splayed-tables)
- [Basics of splayed tables](https://thinqkdb.wordpress.com/splayed-tables/)
A partitioned kdb+ database consists of one or more tables saved on-disk, where they are split into separate folders called partitions. These partitions are most often based on a temporal field within the dataset, such as date or month. Each table within the database must follow the same partition structure.
A visual representation of a database containing 2 tables (trade and quote) partitioned by date would be as follows, where #!python price
, #!python sym
, #!python time
in the quotes folder are columns within the table:
db
├── 2020.10.04
│ ├── quotes
│ │ ├── .d
│ │ ├── price
│ │ ├── sym
│ │ └── time
│ └── trades
│ ├── .d
│ ├── price
│ ├── sym
│ ├── time
│ └── vol
├── 2020.10.06
│ ├── quotes
..
└── sym
!!! note "More information on partitioned databases"
The partitioned database format used by PyKX has been used in production environments for decades in many of the world's best-performing tier-1 investment banks. Today, there is a significant amount of information available on the creation and maintenance of these databases. Below are some articles related to their creation and querying.
- [Blog: Partitioning data with kdb+](https://kx.com/blog/partitioning-data-in-kdb/)
- [Q for Mortals Partitioned Tables](https://code.kx.com/q4m3/14_Introduction_to_Kdb%2B/#143-partitioned-tables)
- [Partitioned Tables](https://thinqkdb.wordpress.com/partitioned-tables/)
Creating and managing databases is crucial for handling large amounts of data. The #!python pykx.DB
module helps make these tasks easier, Pythonic, and more user-friendly.
PyKX Database API supports the following operations:
Operation | Description |
---|---|
Generate | Learn how to generate a new historical database using data from Python/q and expand it over time. |
Load | Learn how to load existing databases and fix some common issues with databases. |
Manage | Copy, change datatypes or names of columns, apply functions to columns, delete columns from a table, rename tables and backfill data. |
Check out a full breakdown of the database API.