|
1 | 1 | # Conduit Connector MongoDB
|
2 | 2 |
|
3 |
| -## General |
4 |
| - |
5 | 3 | The [MongoDB](https://www.mongodb.com/) connector is one of Conduit plugins. It
|
6 | 4 | provides both, a source and a destination MongoDB connector.
|
7 | 5 |
|
8 |
| -### Prerequisites |
| 6 | +<!-- readmegen:description --> |
| 7 | +## Source |
9 | 8 |
|
10 |
| -- [Go](https://go.dev/) 1.23+ |
11 |
| -- [MongoDB](https://www.mongodb.com/) [replica set](https://www.mongodb.com/docs/manual/replication/) ( |
12 |
| - at least single-node) |
13 |
| - or [sharded cluster](https://www.mongodb.com/docs/manual/sharding/) |
14 |
| - with [WiredTiger](https://www.mongodb.com/docs/manual/core/wiredtiger/) |
15 |
| - storage engine |
16 |
| -- [Docker](https://www.docker.com/) |
17 |
| -- (optional) [golangci-lint](https://github.com/golangci/golangci-lint) v1.55.2 |
| 9 | +The MongoDB Source Connector connects to a MongoDB with the provided `uri`, `db` |
| 10 | +and `collection` and starts creating records for each change detected in a |
| 11 | +collection. |
18 | 12 |
|
19 |
| -### How to build it |
| 13 | +Upon starting, the Source takes a snapshot of a given collection in the |
| 14 | +database, then switches into CDC mode. In CDC mode, the plugin reads events from |
| 15 | +a [Change Stream](https://www.mongodb.com/docs/manual/changeStreams/). In order |
| 16 | +for this to work correctly, your MongoDB instance must |
| 17 | +meet [the criteria](https://www.mongodb.com/docs/manual/changeStreams/#availability) |
| 18 | +specified on the official website. |
20 | 19 |
|
21 |
| -Run `make build`. |
| 20 | +### Snapshot Capture |
22 | 21 |
|
23 |
| -### Development |
| 22 | +When the connector first starts, snapshot mode is enabled. The connector reads |
| 23 | +all rows of a collection in batches using |
| 24 | +a [cursor-based](https://www.mongodb.com/docs/drivers/go/current/fundamentals/crud/read-operations/cursor/) |
| 25 | +pagination, |
| 26 | +limiting the rows by `batchSize`. The connector stores the last processed |
| 27 | +element value of an `orderingColumn` in a position, so the snapshot process can |
| 28 | +be paused and resumed without losing data. Once all rows in that initial |
| 29 | +snapshot are read the connector switches into CDC mode. |
24 | 30 |
|
25 |
| -Run `make install-tools` to install all the required tools. |
| 31 | +This behavior is enabled by default, but can be turned off by adding |
| 32 | +`"snapshot": false` to the Source configuration. |
26 | 33 |
|
27 |
| -Run `make test` to run all the units and `make test-integration` to run all the |
28 |
| -integration tests, which require Docker to be installed and running. The command |
29 |
| -will handle starting and stopping docker container for you. |
| 34 | +### Change Data Capture |
| 35 | + |
| 36 | +The connector implements CDC features for MongoDB by using a Change Stream that |
| 37 | +listens to changes in the configured collection. Every detected change is |
| 38 | +converted into a record and returned in the call to `Read`. If there is no |
| 39 | +available record when `Read` is called, the connector returns |
| 40 | +`sdk.ErrBackoffRetry` error. |
| 41 | + |
| 42 | +The connector stores a `resumeToken` of every Change Stream event in a position, |
| 43 | +so the CDC process is resumble. |
| 44 | + |
| 45 | +> **Warning** |
| 46 | +> |
| 47 | +> [Azure CosmosDB for MongoDB](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/change-streams) |
| 48 | +> has very limited support for Change Streams, so they cannot be used for CDC. |
| 49 | +> If CDC is not possible, like in the case with CosmosDB, the connector only |
| 50 | +> supports detecting insert operations by polling for new documents. |
| 51 | +
|
| 52 | +### Key handling |
| 53 | + |
| 54 | +The connector always uses the `_id` field as a key. |
| 55 | + |
| 56 | +If the `_id` field is `bson.ObjectID` the connector converts it to a string when |
| 57 | +transferring a record to a destination, otherwise, it leaves it unchanged. |
| 58 | + |
| 59 | +## Destination |
| 60 | + |
| 61 | +The MongoDB Destination takes a `opencdc.Record` and parses it into a valid |
| 62 | +MongoDB query. The Destination is designed to handle different payloads and |
| 63 | +keys. Because of this, each record is individually parsed and written. |
| 64 | + |
| 65 | +### Collection name |
| 66 | + |
| 67 | +If a record contains an `opencdc.collection` property in its metadata it will be |
| 68 | +written in that collection, otherwise it will fall back to use the `collection` |
| 69 | +configured in the connector. Thus, a Destination can support multiple |
| 70 | +collections in the same connector, as long as the user has proper access to |
| 71 | +those collections. |
| 72 | + |
| 73 | +### Key handling |
| 74 | + |
| 75 | +The connector uses all keys from an `opencdc.Record` when updating and deleting |
| 76 | +documents. |
| 77 | + |
| 78 | +If the `_id` field can be converted to a `bson.ObjectID`, the connector converts |
| 79 | +it, otherwise, it uses it as it is.<!-- /readmegen:description --> |
30 | 80 |
|
31 | 81 | ### Source Configuration Parameters
|
32 | 82 |
|
@@ -248,4 +298,16 @@ pipelines:
|
248 | 298 | ```
|
249 | 299 | <!-- /readmegen:destination.parameters.yaml -->
|
250 | 300 |
|
| 301 | +### How to build it |
| 302 | +
|
| 303 | +Run `make build`. |
| 304 | + |
| 305 | +### Development |
| 306 | + |
| 307 | +Run `make install-tools` to install all the required tools. |
| 308 | + |
| 309 | +Run `make test` to run all the units and `make test-integration` to run all the |
| 310 | +integration tests, which require Docker to be installed and running. The command |
| 311 | +will handle starting and stopping docker container for you. |
| 312 | + |
251 | 313 | 
|
0 commit comments