You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/configuration/node-config.md
+2
Original file line number
Diff line number
Diff line change
@@ -176,13 +176,15 @@ indexer:
176
176
| --- | --- | --- |
177
177
| `max_queue_memory_usage` | Maximum size in bytes of the in-memory Ingest queue. | `2GiB` |
178
178
| `max_queue_disk_usage` | Maximum disk-space in bytes taken by the Ingest queue. The minimum size is at least `256M` and be at least `max_queue_memory_usage`. | `4GiB` |
179
+
| `content_length_limit` | Maximum payload size uncompressed. Increasing this is discouraged, use a [file source](../ingest-data/sqs-files.md) instead. | `10MiB` |
This concludes the tutorial. You can now move on to the [next tutorial](/docs/ingest-data/kafka.md) to learn how to ingest data from Kafka.
72
+
73
+
## Ingest API versions
74
+
75
+
In 0.9, Quickwit introduced a new version of the ingest API that enables distributing the indexing in the cluster regardless of the node that received the ingest request. This new ingestion service is often referred to as "Ingest V2" compared to the legacy ingestion (V1). In upcoming versions the new ingest API will also be capable of replicating the write ahead log in order to achieve higher durability.
76
+
77
+
By default, both ingestion services are enabled and ingest V2 is used. You can toggle this behavior with the following environment variables:
|`QW_ENABLE_INGEST_V2`| Start the V2 ingest service and use it by default. | true |
82
+
|`QW_DISABLE_INGEST_V1`| V1 ingest will be used by the APIs only if V2 is disabled. Running V1 along V2 is necessary to migrate to V2 without loosing existing unindexed V1 logs. | false |
83
+
84
+
:::note
85
+
86
+
These configuration drive the ingest service used both by the `api/v1/<index-id>/ingest` endpoint and the [bulk API](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint).
Ingest V2 is a new ingestion API that is designed to be more efficient and scalable for thousands of indexes than the previous version. It is currently in beta and is not yet enabled by default.
3
+
Ingest V2 is the latest ingestion API that is designed to be more efficient and scalable for thousands of indexes than the previous version. It is the default since 0.9.
4
4
5
-
## Enabling Ingest V2
5
+
## Architecture
6
6
7
-
To enable Ingest V2, you need to set the `QW_ENABLE_INGEST_V2` environment variable to `1`on the indexer, control-plane, and metastore services.
7
+
Just like ingest V1, the new ingest uses [`mrecordlog`](https://github.com/quickwit-oss/mrecordlog) to persist ingested documents that are waiting to be indexed. But unlike V1, which always persists the documents locally on the node that receives them, ingest V2 can dynamically distribute them into WAL units called _shards_. The assigned shard can be local or on another indexer. The controlplane is in charge of distributing the shards to balance the indexing work as well as possible across all indexer nodes. The progress within each shard is not tracked as an index metadata checkpoint anymore but in a dedicated metastore `shards` table.
8
8
9
-
You also have to activate the `enable_cooperative_indexing` option in the indexer configuration. The indexer configuration is in the node configuration:
9
+
In the future, the shard based ingest will also be capable of writing a replica for each shard, thus ensuring a high durability of the documents that are waiting to be indexed (durability of the indexed documents is guarantied by the object store).
10
+
11
+
## Toggling between ingest V1 and V2
12
+
13
+
Variables driving the ingest configuration are documented [here](../ingest-data/ingest-api.md#ingest-api-versions).
14
+
15
+
With ingest V2, you can also activate the `enable_cooperative_indexing` option in the indexer configuration. This setting is useful for deployments with very large numbers (dozens) of actively written indexers, to limit the indexing workbench memory consumption. The indexer configuration is in the node configuration:
10
16
11
17
```yaml
12
18
version: 0.8
@@ -17,4 +23,12 @@ indexer:
17
23
18
24
See [full configuration example](https://github.com/quickwit-oss/quickwit/blob/main/config/quickwit.yaml).
19
25
20
-
The only way to use the ingest API V2 is to use the [bulk endpoint](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint) of the Elasticsearch-compatible API. The native Quickwit API is not yet compatible with the ingest V2 API.
26
+
## Differences between ingest V1 and V2
27
+
28
+
- V1 uses the `queues/` directory whereas V2 uses the `wal/` directory
29
+
- both V1 and V2 are configured with:
30
+
- `ingest_api.max_queue_memory_usage`
31
+
- `ingest_api.max_queue_disk_usage`
32
+
- but ingest V2 can also be configured with:
33
+
- `ingest_api.replication_factor`, not working yet
34
+
- ingest V1 always writes to the WAL of the node receiving the request, V2 potentially forwards it to another node, dynamically assigned by the control plane to distribute the indexing work more evenly.
This directory is created only if the ingest API service is running on your node. It contains write ahead log files of the ingest API to guard against data lost.
34
-
The queue is truncated when Quickwit commits a split (piece of index), which means that the split is stored on the storage and its metadata are in the metastore.
36
+
These directories are created only if the ingest API service is running on your node. They contain write ahead log files of the ingest API to guard against data loss. The `/queues` directory is used by the legacy version of the ingest (sometimes referred to as ingest V1). It is meant to be phased out in upcoming versions of Quickwit. Learn more about ingest API versions [here](../ingest-data/ingest-api.md#ingest-api-versions).
37
+
38
+
The log file is truncated when Quickwit commits a split (piece of index), which means that the split is stored on the storage and its metadata are in the metastore.
35
39
36
40
You can configure `max_queue_memory_usage` and `max_queue_disk_usage` in the [node config file](../configuration/node-config.md#ingest-api-configuration) to limit the max disk usage.
37
41
@@ -110,4 +114,3 @@ With these assumptions, you have to set `split_store_max_num_splits` to at least
110
114
111
115
When starting, Quickwit is scanning all the splits in the cache directory to know which split is present locally, this can take a few minutes if you have tens of thousands splits. On Kubernetes, as your pod can be restarted if it takes too long to start, you may want to clean up the data directory or increase the liveliness probe timeout.
112
116
Also please report such a behavior on [GitHub](https://github.com/quickwit-oss/quickwit) as we can certainly optimize this start phase.
Copy file name to clipboardExpand all lines: docs/operating/upgrades.md
+6
Original file line number
Diff line number
Diff line change
@@ -18,3 +18,9 @@ Quickwit 0.7.1 will create the new index `otel-logs-v0_7` which is now used by d
18
18
19
19
In the traces index `otel-traces-v0_7`, the `service_name` field is now `fast`.
20
20
No migration is done if `otel-traces-v0_7` already exists. If you want `service_name` field to be `fast`, you have to delete first the existing `otel-traces-v0_7` index or you need to create your own index.
21
+
22
+
## Migration from 0.8 to 0.9
23
+
24
+
Quickwit 0.9 introduces a new ingestion service to to power the ingest and bulk APIs (v2). The new ingest is enabled and used by default, even though the legacy one (v1) remains enabled to finish indexing residual data in the legacy write ahead logs. Note that `ingest_api.max_queue_disk_usage` is enforced on both ingest versions separately, which means that the cumulated disk usage might be up to twice this limit.
25
+
26
+
The control plane should be upgraded first in order to enable the new ingest source (v2) on all existing indexes. Ingested data into previously existing indexes on upgraded indexer nodes will not be picked by the indexing pipelines until the control plane is upgraded.
Copy file name to clipboardExpand all lines: docs/reference/cli.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -353,8 +353,8 @@ quickwit index ingest
353
353
|`--index`| ID of the target index |
354
354
|`--input-path`| Location of the input file. |
355
355
|`--batch-size-limit`| Size limit of each submitted document batch. |
356
-
|`--wait`| Wait for all documents to be committed and available for search before exiting |
357
-
|`--force`| Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting |
356
+
|`--wait`| Wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417).|
357
+
|`--force`| Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417).|
358
358
|`--commit-timeout`| Timeout for ingest operations that require waiting for the final commit (`--wait` or `--force`). This is different from the `commit_timeout_secs` indexing setting, which sets the maximum time before committing splits after their creation. |
.help("Wait for all documents to be committed and available for search before exiting")
144
+
.help("Wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417).")
145
145
.action(ArgAction::SetTrue),
146
146
Arg::new("force")
147
147
.long("force")
148
148
.short('f')
149
-
.help("Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting")
149
+
.help("Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417).")
0 commit comments