From 4dcec5abd6c179e45617ae392b43372c29c98b8b Mon Sep 17 00:00:00 2001 From: Bob den Os Date: Tue, 25 Feb 2025 14:10:25 +0100 Subject: [PATCH] Initial additions to the node databases --- node.js/databases.md | 173 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 164 insertions(+), 9 deletions(-) diff --git a/node.js/databases.md b/node.js/databases.md index 0e7bbb952..529cba66f 100644 --- a/node.js/databases.md +++ b/node.js/databases.md @@ -12,18 +12,56 @@ status: released [[toc]] -## cds.**DatabaseService** class { #cds-db-service} +## cds.**DatabaseService** class {#cds-db-service } ### class cds.**DatabaseService** extends cds.Service - ### srv.begin () → this {#db-begin } -In case of database services this actually starts the transaction by acquiring a physical connection from the connection pool, and optionally sends a command to the database like `BEGIN TRANSACTION`. +Creates a transaction out of the DatabaseService. Is called by `cds.tx` automatically. + +### srv.commit () → this {#db-commit } + +Commits all the write operations executed on the transaction. Is called by `cds.tx` automatically. + +### srv.rollback () → this {#db-rollback } + +Reverts all the write operations executed on the transaction. Is called by `cds.tx` automatically. + +### srv.factory () → this {#db-factory } + +The factory property provides an object which implements the pool APIs. Allowing the current DatabaseService to re use the physical database connections improving overall performance. + +### srv.acquire () → this {#db-acquire } + +Creates a connection for the current DatabaseService. Is called by [begin](#db-begin) automatically. + +### srv.release () → this {#db-release } + +Releases the connection for the current DatabaseService. Allowing the connection to be used again for future [acquire](#db-acquire) calls. Is Called by [commit](#db-commit) and [rollback](#db-rollback) automatically. + +### srv.destroy () → this {#db-destroy } + +Destroys the connection for the current DatabaseService. Can be called to completely remove the connection from the DatabaseService. Is required to be called when the connection transaction is in an `unknown` state. + +### srv.disconnect () → this {#db-disconnect } + +Disconnects all connections of the current DatabaseService. This is required for database connections that leverage network connections. As the connections are kept alive while inside the `pool`. Which prevents the process from reaching the `idle` state required for the processes to grasefully shutdown. Is called by `cds.on('shutdown')` automatically. + +### srv.infer () → this {#db-infer } + +Uses the current DatabaseService model to `infer` a provided `cds.ql.Query`. It is best to use `cds.infer` instead. (deprecated?) + +### srv.set () → this {#db-set } + +Sets the provided key value pairs as variables for the current connection. The values can be access by using the `session_context('')` function inside any Queries. Is called by [begin](#db-begin) automatically. + +### srv.run () → this {#db-run } + +Runs the provided `cds.ql.Query` using the current DatabaseService. -This method is called automatically by the framework on the first query, so **you never have to call it** in application coding. There are only very rare cases where you'd want to do so, for example to reuse a `tx` object to start subsequent physical transactions after a former `commit` or `rollback`. But this is not considered good practice. ## cds.DatabaseService — Consumption {#databaseservice-consumption } @@ -129,10 +167,6 @@ Even though we provide a default pool configuration, we expect that each applica The main use case of upsert is data replication. [Upsert](../cds/cqn.md#upsert) updates existing entity records from the given data or inserts new ones if they don't exist in the database. -::: warning -Even if an entity doesn't exist in the database:
→ Upsert is **not** equivalent to Insert. -::: - `UPSERT` statements can be created with the [UPSERT](cds-ql#upsert) query API: ```js @@ -142,7 +176,6 @@ UPSERT.into('db.Books') `UPSERT` queries are translated into DB native upsert statements, more specifically they unfold to an [UPSERT SQL statement](https://help.sap.com/docs/HANA_CLOUD_DATABASE/c1d3f60099654ecfb3fe36ac93c121bb/ea8b6773be584203bcd99da76844c5ed.html) on SAP HANA and to an [INSERT ON CONFLICT SQL statement](https://www.sqlite.org/lang_upsert.html) on SQLite. -- The rows to be upserted need to have the same structure, that is, all rows needs to specify the same named values. - The upsert data must contain all key elements of the entity. - If upsert data is incomplete only the given values are updated or inserted, which means the `UPSERT` statement has "PATCH semantics". - `UPSERT` statements don't have a where clause. The key values of the entity that is upserted are extracted from the data. @@ -155,6 +188,128 @@ The following actions are *not* performed on upsert: In contrast to the Java runtime, deep upserts and delta payloads are not yet supported. ::: +## `@cap-js/db-service` + +The `node.js` DatabaseService core class is implemented by the `@cap-js/db-service` module. With more database specific implementation in `@cap-js/hana`, `@cap-js/sqlite` and `@cap-js/postgres`. Which can be used interchangably depening on the underlying database used. + +### Architecture + +The core principle of the `@cap-js` database services is "don't look at the data". As the database services are the foundational service of all CAP applications the performance of these services is especially important. The heaviest work the database service has to do is handling the `data`. + +#### JSON + +In CAP applications all the `data` uses the `JSON` format. It would be nice if the databases could understand the same format. As this would allow the CAP applications to not transform the `data` between different formats. While `SQL` doesn't specify how `data` should be stored by the database implementing the specification. It does provide certain paradigms which require computationally heavy operations. Which has most implementations pick heavily optimized internal `data` formats that allow for improved performance. Over time the `JSON` format has gained wide popularity and has resulted in many modern databases implement the specification. Which allows CAP applications to convey its intentions to the database through these `JSON` APIs. Removing the need to transform the `data` when reading or writing. + +##### Transform {#databaseservice-architecture-transform } + +It is important to understand the special challenges that come with using javascript. As most database protocols use their internal `data` format to communicate with clients. It is required for the javascript client to convert the javascript types into the database native binary format. + +Probably the most simplistic data type for all programming languages and databases will be an `integer`. For javascript this type doesn't actually exist it is a subset of the `Number` type. Therefor when the database driver has to convert the `Number` type into an `integer` it has to do more work then you might expect. To give a real reference the publicly available `hdb` driver will be used. Which has an implementation for the [`int32`](https://github.com/SAP/node-hdb/blob/6f38a473278730c5edce969a87891420ce4baecb/lib/protocol/data/Int32.js#L35), [`int64`](https://github.com/SAP/node-hdb/blob/6f38a473278730c5edce969a87891420ce4baecb/lib/util/bignum.js#L379) and [`int128`](https://github.com/SAP/node-hdb/blob/6f38a473278730c5edce969a87891420ce4baecb/lib/util/bignum.js#L600) types. Here is a breakdown of the amount of objects and operations each type require before they can be send to the database. + +`int32` +- Objects: 1 (Buffer) +- Operations: 1 (Function) + +`int64` +- Objects: 14 (dynamic Numbers) 40 (static Numbers) 1 (String) +- Operations: 90 (operators) 11 (Functions) + +`int128` +- Objects: 22 (dynamic Numbers) 80 (static Numbers) 1 (String) +- Operations: 192 (operators) 18 (Functions) + +As for comparison when these types are used in a compiled language there are no operations required it will be a pointer. With the only exception being when the database and client use a different endianness. In which case one of them has to swap the bytes around. + +##### Read + +When reading `data` from the database the new implementations rely on the database responding in the `JSON` format. Allowing the CAP application to not have to do any postprocessing on the response. It is possible for the ODataService to take the result of the DatabaseService as is. The way this is achieved is by using output converts which are baked into the `SQL` statement. Allowing the database to convert the internal format into the OData specified format of that type. The output converter is a function attached to the element of any entity or query. Which enables protocol adapters to generate queries with protocol specific converters. The output converters are database specific so depending on the database internal `data` structure the output converter might be more or less computationally intensive or be completely omitted when the database is OData compliant. + +Another big benefit that reading in the `JSON` format directly from the database enables. Is the ability to read deep nested data structures. While `SQL` only allows a single scalar value to be selected by sub queries. By converting a multi column / row result into a single `JSON` result it is possible to create database native `expand` queries. Which means that the database can optimize the execution better. Where in the past it was required for the application to send multiple requests for the different levels of `expand` queries or convert flattened to-one `expand` queries. + +When the CAP applications know that there is no post processing required (e.g. no `after` handlers). It can skip the `JSON.parse` and `JSON.stringify` which are required to grand javascript access to the result for manipulations. With the ultimate goal of not having to load the whole result into memory, but instead stream the database result chunks directly into the http response connection. Allowing CAP applications to handle much larger `data` sets while using significantly less memory and cpu. + +##### Write + +Using the `JSON` format for writing operations comes with many cascading improvements. + +When using the standard way of executing `INSERT` and `UPSERT` queries. It is required to match the `SQL` to the data structure. Which goes against the main principle of the new database services. + +A simple example of the impact is as follows: + +```javascript +await INSERT([ + {ID:1}, + {ID:2, name: ''}, + {ID:3, descr: ''} +]) +``` + +Which will actually `prepare` and `execute` the following queries: + +```SQL +INSERT INTO (ID) VALUES (?) +INSERT INTO (ID,name) VALUES (?,?) +INSERT INTO (ID,descr) VALUES (?,?) +``` + +Where the usage of the `JSON` format allows the database services to only `prepare` and `execute` the following query: + +```SQL +INSERT INTO (ID,name,descr,...) AS SELECT ID, name, descr,... FROM JSON_EACH(?) +``` + +Where the placeholder will be provided with the `JSON.stringify` of the provided `entries`. Which means that the query will only be executed once. Reducing the number of network round trips required to do the same amount of work. Greatly improving through put as multiple rows can fit within a single network packet and the transformation to a `JSON` string is much cheaper then the native transformation steps as mentioned in the [transform](#databaseservice-architecture-transform) section. + +By having a single `JSON` placeholder it is possible to stream the dataset through the CAP application. By taking the `req` object which natively is a `Readable` stream and providing it as the `entries` of the `INSERT` statement. + +```javascript +app.post('/upload', async (req,res) => { + try{ + await INSERT(req).into('entity') + res.status(201) + res.end() + } catch (err) { + res.status(400) + res.end(err.message) + } +}) +``` + +As the `JSON` is converted into an intermediate table it is also possible to improve the way that `UPSERT` statements are handled. It has all the same benefits as `INSERT` query have and a bit more. + +```SQL +UPSERT INTO (ID,name,createdAt,modifiedAt,...) AS +SELECT + ID, + new.name ? new.name : old.name, -- only update name when provided + old.ID ? old.createdAt : $now, -- only apply @cds.on.insert when no OLD entry exists + old.ID ? $now : null -- only apply @cds.on.update when an OLD entry exists +FROM JSON_EACH(?) AS NEW +JOIN OLD ON OLD.ID = NEW.ID +``` + +##### Match + +For `@cap-js/hana` specifically there is a `JSON` optimization that assists HANA in re-using execution plans. As HANA has a very advanced execution plan optimizer it is very valuable to be able to re-use the already existing execution plans. One kind of query was always preventing HANA from using the existing execution plans as the `SQL` query would always change based upon the data provided. + +```javascript +const IDs = [{val:1},{val:2},...] +cds.ql`SELECT * FROM ${entity} WHERE ID in ${IDs}` +``` + +Which would create a slightly different variant of the following query based upon then number of `val`s provided: + +```SQL +SELECT * FROM entity WHERE ID IN (?,?,...) +``` + +Where now this query will always prodice the same `SQL` statement. Allowing HANA to use the existing execution plan. + +```SQL +SELECT * FROM entity WHERE ID IN (SELECT VAL FROM JSON_TABLE(?,'$' COLUMNS(VAL DOUBLE PATH '$.val'))) +``` + + ## More to Come This documentation is not complete yet, or the APIs are not released for general availability.