Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node: Document Database Service #1667

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 164 additions & 9 deletions node.js/databases.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,56 @@ status: released
[[toc]]


## cds.**DatabaseService** <i> class </i> { #cds-db-service}
## cds.**DatabaseService** <i> class </i> {#cds-db-service }

### class cds.**DatabaseService** <i> extends cds.Service </i>




### <span style="color:grey">srv</span>.begin <i> () → this </i> {#db-begin }

In case of database services this actually starts the transaction by acquiring a physical connection from the connection pool, and optionally sends a command to the database like `BEGIN TRANSACTION`.
Creates a transaction out of the DatabaseService. Is called by `cds.tx` automatically.

### <span style="color:grey">srv</span>.commit <i> () → this </i> {#db-commit }

Commits all the write operations executed on the transaction. Is called by `cds.tx` automatically.

### <span style="color:grey">srv</span>.rollback <i> () → this </i> {#db-rollback }

Reverts all the write operations executed on the transaction. Is called by `cds.tx` automatically.

### <span style="color:grey">srv</span>.factory <i> () → this </i> {#db-factory }

The factory property provides an object which implements the pool APIs. Allowing the current DatabaseService to re use the physical database connections improving overall performance.

### <span style="color:grey">srv</span>.acquire <i> () → this </i> {#db-acquire }

Creates a connection for the current DatabaseService. Is called by [begin](#db-begin) automatically.

### <span style="color:grey">srv</span>.release <i> () → this </i> {#db-release }

Releases the connection for the current DatabaseService. Allowing the connection to be used again for future [acquire](#db-acquire) calls. Is Called by [commit](#db-commit) and [rollback](#db-rollback) automatically.

### <span style="color:grey">srv</span>.destroy <i> () → this </i> {#db-destroy }

Destroys the connection for the current DatabaseService. Can be called to completely remove the connection from the DatabaseService. Is required to be called when the connection transaction is in an `unknown` state.

### <span style="color:grey">srv</span>.disconnect <i> () → this </i> {#db-disconnect }

Disconnects all connections of the current DatabaseService. This is required for database connections that leverage network connections. As the connections are kept alive while inside the `pool`. Which prevents the process from reaching the `idle` state required for the processes to grasefully shutdown. Is called by `cds.on('shutdown')` automatically.

### <span style="color:grey">srv</span>.infer <i> () → this </i> {#db-infer }

Uses the current DatabaseService model to `infer` a provided `cds.ql.Query`. It is best to use `cds.infer` instead. (deprecated?)

### <span style="color:grey">srv</span>.set <i> () → this </i> {#db-set }

Sets the provided key value pairs as variables for the current connection. The values can be access by using the `session_context('<key>')` function inside any Queries. Is called by [begin](#db-begin) automatically.

### <span style="color:grey">srv</span>.run <i> () → this </i> {#db-run }

Runs the provided `cds.ql.Query` using the current DatabaseService.

This method is called automatically by the framework on the first query, so **you never have to call it** in application coding. There are only very rare cases where you'd want to do so, for example to reuse a `tx` object to start subsequent physical transactions after a former `commit` or `rollback`. But this is not considered good practice.


## cds.DatabaseService — Consumption {#databaseservice-consumption }
Expand Down Expand Up @@ -129,10 +167,6 @@ Even though we provide a default pool configuration, we expect that each applica

The main use case of upsert is data replication. [Upsert](../cds/cqn.md#upsert) updates existing entity records from the given data or inserts new ones if they don't exist in the database.

::: warning
Even if an entity doesn't exist in the database:<br> &rarr; Upsert is **not** equivalent to Insert.
:::

`UPSERT` statements can be created with the [UPSERT](cds-ql#upsert) query API:

```js
Expand All @@ -142,7 +176,6 @@ UPSERT.into('db.Books')

`UPSERT` queries are translated into DB native upsert statements, more specifically they unfold to an [UPSERT SQL statement](https://help.sap.com/docs/HANA_CLOUD_DATABASE/c1d3f60099654ecfb3fe36ac93c121bb/ea8b6773be584203bcd99da76844c5ed.html) on SAP HANA and to an [INSERT ON CONFLICT SQL statement](https://www.sqlite.org/lang_upsert.html) on SQLite.

- The rows to be upserted need to have the same structure, that is, all rows needs to specify the same named values.
- The upsert data must contain all key elements of the entity.
- If upsert data is incomplete only the given values are updated or inserted, which means the `UPSERT` statement has "PATCH semantics".
- `UPSERT` statements don't have a where clause. The key values of the entity that is upserted are extracted from the data.
Expand All @@ -155,6 +188,128 @@ The following actions are *not* performed on upsert:
In contrast to the Java runtime, deep upserts and delta payloads are not yet supported.
:::

## `@cap-js/db-service`

The `node.js` DatabaseService core class is implemented by the `@cap-js/db-service` module. With more database specific implementation in `@cap-js/hana`, `@cap-js/sqlite` and `@cap-js/postgres`. Which can be used interchangably depening on the underlying database used.

### Architecture

The core principle of the `@cap-js` database services is "don't look at the data". As the database services are the foundational service of all CAP applications the performance of these services is especially important. The heaviest work the database service has to do is handling the `data`.

#### JSON

In CAP applications all the `data` uses the `JSON` format. It would be nice if the databases could understand the same format. As this would allow the CAP applications to not transform the `data` between different formats. While `SQL` doesn't specify how `data` should be stored by the database implementing the specification. It does provide certain paradigms which require computationally heavy operations. Which has most implementations pick heavily optimized internal `data` formats that allow for improved performance. Over time the `JSON` format has gained wide popularity and has resulted in many modern databases implement the specification. Which allows CAP applications to convey its intentions to the database through these `JSON` APIs. Removing the need to transform the `data` when reading or writing.

##### Transform {#databaseservice-architecture-transform }

It is important to understand the special challenges that come with using javascript. As most database protocols use their internal `data` format to communicate with clients. It is required for the javascript client to convert the javascript types into the database native binary format.

Probably the most simplistic data type for all programming languages and databases will be an `integer`. For javascript this type doesn't actually exist it is a subset of the `Number` type. Therefor when the database driver has to convert the `Number` type into an `integer` it has to do more work then you might expect. To give a real reference the publicly available `hdb` driver will be used. Which has an implementation for the [`int32`](https://github.com/SAP/node-hdb/blob/6f38a473278730c5edce969a87891420ce4baecb/lib/protocol/data/Int32.js#L35), [`int64`](https://github.com/SAP/node-hdb/blob/6f38a473278730c5edce969a87891420ce4baecb/lib/util/bignum.js#L379) and [`int128`](https://github.com/SAP/node-hdb/blob/6f38a473278730c5edce969a87891420ce4baecb/lib/util/bignum.js#L600) types. Here is a breakdown of the amount of objects and operations each type require before they can be send to the database.

`int32`
- Objects: 1 (Buffer)
- Operations: 1 (Function)

`int64`
- Objects: 14 (dynamic Numbers) 40 (static Numbers) 1 (String)
- Operations: 90 (operators) 11 (Functions)

`int128`
- Objects: 22 (dynamic Numbers) 80 (static Numbers) 1 (String)
- Operations: 192 (operators) 18 (Functions)

As for comparison when these types are used in a compiled language there are no operations required it will be a pointer. With the only exception being when the database and client use a different endianness. In which case one of them has to swap the bytes around.

##### Read

When reading `data` from the database the new implementations rely on the database responding in the `JSON` format. Allowing the CAP application to not have to do any postprocessing on the response. It is possible for the ODataService to take the result of the DatabaseService as is. The way this is achieved is by using output converts which are baked into the `SQL` statement. Allowing the database to convert the internal format into the OData specified format of that type. The output converter is a function attached to the element of any entity or query. Which enables protocol adapters to generate queries with protocol specific converters. The output converters are database specific so depending on the database internal `data` structure the output converter might be more or less computationally intensive or be completely omitted when the database is OData compliant.

Another big benefit that reading in the `JSON` format directly from the database enables. Is the ability to read deep nested data structures. While `SQL` only allows a single scalar value to be selected by sub queries. By converting a multi column / row result into a single `JSON` result it is possible to create database native `expand` queries. Which means that the database can optimize the execution better. Where in the past it was required for the application to send multiple requests for the different levels of `expand` queries or convert flattened to-one `expand` queries.

When the CAP applications know that there is no post processing required (e.g. no `after` handlers). It can skip the `JSON.parse` and `JSON.stringify` which are required to grand javascript access to the result for manipulations. With the ultimate goal of not having to load the whole result into memory, but instead stream the database result chunks directly into the http response connection. Allowing CAP applications to handle much larger `data` sets while using significantly less memory and cpu.

##### Write

Using the `JSON` format for writing operations comes with many cascading improvements.

When using the standard way of executing `INSERT` and `UPSERT` queries. It is required to match the `SQL` to the data structure. Which goes against the main principle of the new database services.

A simple example of the impact is as follows:

```javascript
await INSERT([
{ID:1},
{ID:2, name: ''},
{ID:3, descr: ''}
])
```

Which will actually `prepare` and `execute` the following queries:

```SQL
INSERT INTO (ID) VALUES (?)
INSERT INTO (ID,name) VALUES (?,?)
INSERT INTO (ID,descr) VALUES (?,?)
```

Where the usage of the `JSON` format allows the database services to only `prepare` and `execute` the following query:

```SQL
INSERT INTO (ID,name,descr,...) AS SELECT ID, name, descr,... FROM JSON_EACH(?)
```

Where the placeholder will be provided with the `JSON.stringify` of the provided `entries`. Which means that the query will only be executed once. Reducing the number of network round trips required to do the same amount of work. Greatly improving through put as multiple rows can fit within a single network packet and the transformation to a `JSON` string is much cheaper then the native transformation steps as mentioned in the [transform](#databaseservice-architecture-transform) section.

By having a single `JSON` placeholder it is possible to stream the dataset through the CAP application. By taking the `req` object which natively is a `Readable` stream and providing it as the `entries` of the `INSERT` statement.

```javascript
app.post('/upload', async (req,res) => {
try{
await INSERT(req).into('entity')
res.status(201)
res.end()
} catch (err) {
res.status(400)
res.end(err.message)
}
})
```

As the `JSON` is converted into an intermediate table it is also possible to improve the way that `UPSERT` statements are handled. It has all the same benefits as `INSERT` query have and a bit more.

```SQL
UPSERT INTO (ID,name,createdAt,modifiedAt,...) AS
SELECT
ID,
new.name ? new.name : old.name, -- only update name when provided
old.ID ? old.createdAt : $now, -- only apply @cds.on.insert when no OLD entry exists
old.ID ? $now : null -- only apply @cds.on.update when an OLD entry exists
FROM JSON_EACH(?) AS NEW
JOIN OLD ON OLD.ID = NEW.ID
```

##### Match

For `@cap-js/hana` specifically there is a `JSON` optimization that assists HANA in re-using execution plans. As HANA has a very advanced execution plan optimizer it is very valuable to be able to re-use the already existing execution plans. One kind of query was always preventing HANA from using the existing execution plans as the `SQL` query would always change based upon the data provided.

```javascript
const IDs = [{val:1},{val:2},...]
cds.ql`SELECT * FROM ${entity} WHERE ID in ${IDs}`
```

Which would create a slightly different variant of the following query based upon then number of `val`s provided:

```SQL
SELECT * FROM entity WHERE ID IN (?,?,...)
```

Where now this query will always prodice the same `SQL` statement. Allowing HANA to use the existing execution plan.

```SQL
SELECT * FROM entity WHERE ID IN (SELECT VAL FROM JSON_TABLE(?,'$' COLUMNS(VAL DOUBLE PATH '$.val')))
```


## <i> More to Come </i>

This documentation is not complete yet, or the APIs are not released for general availability.
Expand Down