Skip to content

Commit f9a7e68

Browse files
authored
Update docs to prepare for 0.9 release (#5605)
* Update docs to prepare for 0.9 release * Fix typos and minor clarifications
1 parent 385c5b5 commit f9a7e68

File tree

12 files changed

+131
-59
lines changed

12 files changed

+131
-59
lines changed

CHANGELOG.md

+71
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,77 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2222
2323
--->
2424

25+
# [0.9.0]
26+
27+
### Added
28+
- Add Ingest V2 (#5600, #5566, #5463, #5375, #5350, #5252 #5202)
29+
- Add SQS source (#5374, #5335, #5148)
30+
- Disable control plane check for searcher (#5599, #5360)
31+
- Partially implement `_elastic/_cluster/health` (#5595)
32+
- Make Jaeger span attribute-to-tag conversion exhaustive (#5574)
33+
- Use `content_length_limit` for ES bulk limit (#5573)
34+
- Limit and monitor warmup memory usage (#5568)
35+
- Add eviction metrics to caches (#5523)
36+
- Record object storage request latencies (#5521)
37+
- Add some kind of throttling on the janitor to prevent it from overloading (#5510)
38+
- Prevent single split searches from different `leaf_search` from interleaving (#5509)
39+
- Retry on S3 internal error (#5504)
40+
- Allow specifying OTEL index ID in header (#5503)
41+
- Add a metric to count storage errors and their error code (#5497)
42+
- Add support for concatenated fields (#4773, #5369, #5331)
43+
- Add number of splits per root/leaf search histograms (#5472)
44+
- Introduce a searcher config option to timeout get requests (#5467)
45+
- Add fingerprint to task in cluster state (#5464)
46+
- Enrich root/leaf search spans with number of docs and splits (#5450)
47+
- Add some additional search metrics (#5447)
48+
- Improve GC resilience and add metrics (#5420)
49+
- Enable force shutdown with 2nd Ctrl+C (#5414)
50+
- Add request_timeout_secs config to searcher config (#5402)
51+
- Memoize S3 client (#5377)
52+
- Add more env var config for Postgres (#5365)
53+
- Enable str fast field range queries (#5324)
54+
- Allow querying non-existing fields (#5308)
55+
- Support updating doc mapper through api (#5253)
56+
- Add optional special handling for hex in code tokenizer (#5200)
57+
- Added a circuit breaker layer (#5134)
58+
- Various performance optimizations in Tantivy (https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md)
59+
60+
### Changed
61+
- Parse datetimes and timestamps with leading and/or trailing whitespace (#5544)
62+
- Restrict maturity period to retention (#5543)
63+
- Wait for merge at end of local ingest (#5542)
64+
- Log PostgreSQL metastore error (#5530)
65+
- Update azure multipart policy (#5553)
66+
- Stop relying on our own version of pulsar-rs (#5487)
67+
- Handle nested OTLP values in attributes and log bodies (#5485)
68+
- Improve merge pipeline finalization (#5475)
69+
- Allow failed splits in root search (#5440)
70+
- Batch delete from GC (#5404, #5380)
71+
- Make some S3 errors retryable (#5384)
72+
- Change default timestamps in OTEL logs (#5366)
73+
- Only return root spans for Jaeger HTTP API (#5358)
74+
- Share aggregation limit on node (#5357)
75+
76+
### Fixed
77+
- Fix existence queries for nested fields (#5581)
78+
- Fix lenient option with wildcard queries (#5575)
79+
- Fix incompatible ES Java date format (#5462)
80+
- Fix bulk api response order (#5434)
81+
- Fix pulsar finalize (#5471)
82+
- Fix pulsar URI scheme (#5470)
83+
- Fix grafana searchers dashboard (#5455)
84+
- Fix jaeger http endpoint (#5378)
85+
- Fix file re-ingestion after EOF (#5330)
86+
- Fix source path in Lambda distrib (#5327)
87+
- Fix configuration interpolation (#5403)
88+
- Fix jaeger duration parse error (#5518)
89+
- Fix unit conversion in jaeger http search endpoint (#5519)
90+
91+
### Removed
92+
- Remove support for 2-digit years in java datetime parser (#5596)
93+
- Remove DocMapper trait (#5508)
94+
95+
2596
# [0.8.1]
2697

2798
### Fixed

config/quickwit.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ indexer:
130130
# ingest_api:
131131
# max_queue_memory_usage: 2GiB
132132
# max_queue_disk_usage: 4GiB
133+
# content_length_limit: 10MiB
133134
#
134135
# -------------------------------- Searcher settings --------------------------------
135136
# https://quickwit.io/docs/configuration/node-config#searcher-configuration

docs/configuration/node-config.md

+2
Original file line numberDiff line numberDiff line change
@@ -176,13 +176,15 @@ indexer:
176176
| --- | --- | --- |
177177
| `max_queue_memory_usage` | Maximum size in bytes of the in-memory Ingest queue. | `2GiB` |
178178
| `max_queue_disk_usage` | Maximum disk-space in bytes taken by the Ingest queue. The minimum size is at least `256M` and be at least `max_queue_memory_usage`. | `4GiB` |
179+
| `content_length_limit` | Maximum payload size uncompressed. Increasing this is discouraged, use a [file source](../ingest-data/sqs-files.md) instead. | `10MiB` |
179180

180181
Example:
181182

182183
```yaml
183184
ingest_api:
184185
max_queue_memory_usage: 2GiB
185186
max_queue_disk_usage: 4GiB
187+
content_length_limit: 10MiB
186188
```
187189

188190
## Searcher configuration

docs/deployment/cluster-sizing.md

+2-3
Original file line numberDiff line numberDiff line change
@@ -41,9 +41,8 @@ To utilize all CPUs on Indexer nodes that have more than 4 cores, your indexing
4141
workload needs to be broken down into multiple indexing pipelines. This can be
4242
achieved by creating multiple indexes or by using a [partitioned data
4343
source](../configuration/source-config.md#number-of-pipelines) such as
44-
[Kafka](../configuration/source-config.md#kafka-source).
45-
46-
<!-- TODO: change this note when releasing ingest v2 -->
44+
[Kafka](../configuration/source-config.md#kafka-source) or the [ingest API
45+
(v2)](../ingest-data/ingest-api.md#ingest-api-versions).
4746

4847
:::
4948

docs/ingest-data/ingest-api.md

+17
Original file line numberDiff line numberDiff line change
@@ -69,3 +69,20 @@ curl -XDELETE 'http://localhost:7280/api/v1/indexes/stackoverflow-schemaless'
6969
```
7070

7171
This concludes the tutorial. You can now move on to the [next tutorial](/docs/ingest-data/kafka.md) to learn how to ingest data from Kafka.
72+
73+
## Ingest API versions
74+
75+
In 0.9, Quickwit introduced a new version of the ingest API that enables distributing the indexing in the cluster regardless of the node that received the ingest request. This new ingestion service is often referred to as "Ingest V2" compared to the legacy ingestion (V1). In upcoming versions the new ingest API will also be capable of replicating the write ahead log in order to achieve higher durability.
76+
77+
By default, both ingestion services are enabled and ingest V2 is used. You can toggle this behavior with the following environment variables:
78+
79+
| Variable | Description | Default value |
80+
| --------------------- | --------------|-------------- |
81+
| `QW_ENABLE_INGEST_V2` | Start the V2 ingest service and use it by default. | true |
82+
| `QW_DISABLE_INGEST_V1`| V1 ingest will be used by the APIs only if V2 is disabled. Running V1 along V2 is necessary to migrate to V2 without loosing existing unindexed V1 logs. | false |
83+
84+
:::note
85+
86+
These configuration drive the ingest service used both by the `api/v1/<index-id>/ingest` endpoint and the [bulk API](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint).
87+
88+
:::

docs/internals/ingest-v2.md

+19-5
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,18 @@
11
# Ingest V2
22

3-
Ingest V2 is a new ingestion API that is designed to be more efficient and scalable for thousands of indexes than the previous version. It is currently in beta and is not yet enabled by default.
3+
Ingest V2 is the latest ingestion API that is designed to be more efficient and scalable for thousands of indexes than the previous version. It is the default since 0.9.
44

5-
## Enabling Ingest V2
5+
## Architecture
66

7-
To enable Ingest V2, you need to set the `QW_ENABLE_INGEST_V2` environment variable to `1` on the indexer, control-plane, and metastore services.
7+
Just like ingest V1, the new ingest uses [`mrecordlog`](https://github.com/quickwit-oss/mrecordlog) to persist ingested documents that are waiting to be indexed. But unlike V1, which always persists the documents locally on the node that receives them, ingest V2 can dynamically distribute them into WAL units called _shards_. The assigned shard can be local or on another indexer. The control plane is in charge of distributing the shards to balance the indexing work as well as possible across all indexer nodes. The progress within each shard is not tracked as an index metadata checkpoint anymore but in a dedicated metastore `shards` table.
88

9-
You also have to activate the `enable_cooperative_indexing` option in the indexer configuration. The indexer configuration is in the node configuration:
9+
In the future, the shard based ingest will also be capable of writing a replica for each shard, thus ensuring a high durability of the documents that are waiting to be indexed (durability of the indexed documents is guarantied by the object store).
10+
11+
## Toggling between ingest V1 and V2
12+
13+
Variables driving the ingest configuration are documented [here](../ingest-data/ingest-api.md#ingest-api-versions).
14+
15+
With ingest V2, you can also activate the `enable_cooperative_indexing` option in the indexer configuration. This setting is useful for deployments with very large numbers (dozens) of actively written indexers, to limit the indexing workbench memory consumption. The indexer configuration is in the node configuration:
1016

1117
```yaml
1218
version: 0.8
@@ -17,4 +23,12 @@ indexer:
1723
1824
See [full configuration example](https://github.com/quickwit-oss/quickwit/blob/main/config/quickwit.yaml).
1925
20-
The only way to use the ingest API V2 is to use the [bulk endpoint](../reference/es_compatible_api.md#_bulk--batch-ingestion-endpoint) of the Elasticsearch-compatible API. The native Quickwit API is not yet compatible with the ingest V2 API.
26+
## Differences between ingest V1 and V2
27+
28+
- V1 uses the `queues/` directory whereas V2 uses the `wal/` directory
29+
- both V1 and V2 are configured with:
30+
- `ingest_api.max_queue_memory_usage`
31+
- `ingest_api.max_queue_disk_usage`
32+
- but ingest V2 can also be configured with:
33+
- `ingest_api.replication_factor`, not working yet
34+
- ingest V1 always writes to the WAL of the node receiving the request, V2 potentially forwards it to another node, dynamically assigned by the control plane to distribute the indexing work more evenly.

docs/internals/template-index.md

+2-4
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,12 @@ curl -XPUT -H 'Content-Type: application/yaml' 'http://localhost:7280/api/v1/tem
1919
curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json
2020

2121
# Ingest 10k docs into `stackoverflow-foo` index.
22-
curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow-foo/ingest-v2" --data-binary @stackoverflow.posts.transformed-10000.json
22+
curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow-foo/ingest" --data-binary @stackoverflow.posts.transformed-10000.json
2323

2424
# Ingest 10k docs into `stackoverflow-bar` index.
25-
curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow-bar/ingest-v2" --data-binary @stackoverflow.posts.transformed-10000.json
25+
curl -XPOST "http://127.0.0.1:7280/api/v1/stackoverflow-bar/ingest" --data-binary @stackoverflow.posts.transformed-10000.json
2626

2727
# Delete Stackoverflow template.
2828
curl -XDELETE 'http://localhost:7280/api/v1/templates/stackoverflow'
2929

3030
```bash
31-
32-

docs/operating/data-directory.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,20 @@ qwdata
2222
├── indexing
2323
│ ├── wikipedia%01H13SVKDS03P%_ingest-api-source%RbaOAI
2424
│ └── wikipedia%01H13SVKDS03P%kafka-source%cNqQtI
25+
├── wal
26+
│ ├── wal-00000000000000000056
27+
│ └── wal-00000000000000000057
2528
└── queues
2629
├── partition_id
2730
├── wal-00000000000000000028
2831
└── wal-00000000000000000029
2932
```
3033

31-
### `/queues` directory
34+
### `/queues` and `/wal` directories
3235

33-
This directory is created only if the ingest API service is running on your node. It contains write ahead log files of the ingest API to guard against data lost.
34-
The queue is truncated when Quickwit commits a split (piece of index), which means that the split is stored on the storage and its metadata are in the metastore.
36+
These directories are created only if the ingest API service is running on your node. They contain write ahead log files of the ingest API to guard against data loss. The `/queues` directory is used by the legacy version of the ingest (sometimes referred to as ingest V1). It is meant to be phased out in upcoming versions of Quickwit. Learn more about ingest API versions [here](../ingest-data/ingest-api.md#ingest-api-versions).
37+
38+
The log file is truncated when Quickwit commits a split (piece of index), which means that the split is stored on the storage and its metadata are in the metastore.
3539

3640
You can configure `max_queue_memory_usage` and `max_queue_disk_usage` in the [node config file](../configuration/node-config.md#ingest-api-configuration) to limit the max disk usage.
3741

@@ -110,4 +114,3 @@ With these assumptions, you have to set `split_store_max_num_splits` to at least
110114

111115
When starting, Quickwit is scanning all the splits in the cache directory to know which split is present locally, this can take a few minutes if you have tens of thousands splits. On Kubernetes, as your pod can be restarted if it takes too long to start, you may want to clean up the data directory or increase the liveliness probe timeout.
112116
Also please report such a behavior on [GitHub](https://github.com/quickwit-oss/quickwit) as we can certainly optimize this start phase.
113-

docs/operating/upgrades.md

+6
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,9 @@ Quickwit 0.7.1 will create the new index `otel-logs-v0_7` which is now used by d
1818

1919
In the traces index `otel-traces-v0_7`, the `service_name` field is now `fast`.
2020
No migration is done if `otel-traces-v0_7` already exists. If you want `service_name` field to be `fast`, you have to delete first the existing `otel-traces-v0_7` index or you need to create your own index.
21+
22+
## Migration from 0.8 to 0.9
23+
24+
Quickwit 0.9 introduces a new ingestion service to to power the ingest and bulk APIs (v2). The new ingest is enabled and used by default, even though the legacy one (v1) remains enabled to finish indexing residual data in the legacy write ahead logs. Note that `ingest_api.max_queue_disk_usage` is enforced on both ingest versions separately, which means that the cumulated disk usage might be up to twice this limit.
25+
26+
The control plane should be upgraded first in order to enable the new ingest source (v2) on all existing indexes. Ingested data into previously existing indexes on upgraded indexer nodes will not be picked by the indexing pipelines until the control plane is upgraded.

docs/reference/cli.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -353,8 +353,8 @@ quickwit index ingest
353353
| `--index` | ID of the target index |
354354
| `--input-path` | Location of the input file. |
355355
| `--batch-size-limit` | Size limit of each submitted document batch. |
356-
| `--wait` | Wait for all documents to be committed and available for search before exiting |
357-
| `--force` | Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting |
356+
| `--wait` | Wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417). |
357+
| `--force` | Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417). |
358358
| `--commit-timeout` | Timeout for ingest operations that require waiting for the final commit (`--wait` or `--force`). This is different from the `commit_timeout_secs` indexing setting, which sets the maximum time before committing splits after their creation. |
359359

360360
*Examples*

quickwit/quickwit-cli/src/index.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -141,12 +141,12 @@ pub fn build_index_command() -> Command {
141141
Arg::new("wait")
142142
.long("wait")
143143
.short('w')
144-
.help("Wait for all documents to be committed and available for search before exiting")
144+
.help("Wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417).")
145145
.action(ArgAction::SetTrue),
146146
Arg::new("force")
147147
.long("force")
148148
.short('f')
149-
.help("Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting")
149+
.help("Force a commit after the last document is sent, and wait for all documents to be committed and available for search before exiting. Applies only to the last batch, see [#5417](https://github.com/quickwit-oss/quickwit/issues/5417).")
150150
.action(ArgAction::SetTrue)
151151
.conflicts_with("wait"),
152152
Arg::new("commit-timeout")

quickwit/quickwit-ingest/DESIGN.md

-39
This file was deleted.

0 commit comments

Comments
 (0)