Skip to content

Commit 27a4440

Browse files
committed
readme
1 parent 65545d9 commit 27a4440

File tree

2 files changed

+86
-24
lines changed

2 files changed

+86
-24
lines changed

README.md

Lines changed: 80 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,82 @@
11
# Conduit Connector MongoDB
22

3-
## General
4-
53
The [MongoDB](https://www.mongodb.com/) connector is one of Conduit plugins. It
64
provides both, a source and a destination MongoDB connector.
75

8-
### Prerequisites
6+
<!-- readmegen:description -->
7+
## Source
98

10-
- [Go](https://go.dev/) 1.23+
11-
- [MongoDB](https://www.mongodb.com/) [replica set](https://www.mongodb.com/docs/manual/replication/) (
12-
at least single-node)
13-
or [sharded cluster](https://www.mongodb.com/docs/manual/sharding/)
14-
with [WiredTiger](https://www.mongodb.com/docs/manual/core/wiredtiger/)
15-
storage engine
16-
- [Docker](https://www.docker.com/)
17-
- (optional) [golangci-lint](https://github.com/golangci/golangci-lint) v1.55.2
9+
The MongoDB Source Connector connects to a MongoDB with the provided `uri`, `db`
10+
and `collection` and starts creating records for each change detected in a
11+
collection.
1812

19-
### How to build it
13+
Upon starting, the Source takes a snapshot of a given collection in the
14+
database, then switches into CDC mode. In CDC mode, the plugin reads events from
15+
a [Change Stream](https://www.mongodb.com/docs/manual/changeStreams/). In order
16+
for this to work correctly, your MongoDB instance must
17+
meet [the criteria](https://www.mongodb.com/docs/manual/changeStreams/#availability)
18+
specified on the official website.
2019

21-
Run `make build`.
20+
### Snapshot Capture
2221

23-
### Development
22+
When the connector first starts, snapshot mode is enabled. The connector reads
23+
all rows of a collection in batches using
24+
a [cursor-based](https://www.mongodb.com/docs/drivers/go/current/fundamentals/crud/read-operations/cursor/)
25+
pagination,
26+
limiting the rows by `batchSize`. The connector stores the last processed
27+
element value of an `orderingColumn` in a position, so the snapshot process can
28+
be paused and resumed without losing data. Once all rows in that initial
29+
snapshot are read the connector switches into CDC mode.
2430

25-
Run `make install-tools` to install all the required tools.
31+
This behavior is enabled by default, but can be turned off by adding
32+
`"snapshot": false` to the Source configuration.
2633

27-
Run `make test` to run all the units and `make test-integration` to run all the
28-
integration tests, which require Docker to be installed and running. The command
29-
will handle starting and stopping docker container for you.
34+
### Change Data Capture
35+
36+
The connector implements CDC features for MongoDB by using a Change Stream that
37+
listens to changes in the configured collection. Every detected change is
38+
converted into a record and returned in the call to `Read`. If there is no
39+
available record when `Read` is called, the connector returns
40+
`sdk.ErrBackoffRetry` error.
41+
42+
The connector stores a `resumeToken` of every Change Stream event in a position,
43+
so the CDC process is resumble.
44+
45+
> **Warning**
46+
>
47+
> [Azure CosmosDB for MongoDB](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/change-streams)
48+
> has very limited support for Change Streams, so they cannot be used for CDC.
49+
> If CDC is not possible, like in the case with CosmosDB, the connector only
50+
> supports detecting insert operations by polling for new documents.
51+
52+
### Key handling
53+
54+
The connector always uses the `_id` field as a key.
55+
56+
If the `_id` field is `bson.ObjectID` the connector converts it to a string when
57+
transferring a record to a destination, otherwise, it leaves it unchanged.
58+
59+
## Destination
60+
61+
The MongoDB Destination takes a `opencdc.Record` and parses it into a valid
62+
MongoDB query. The Destination is designed to handle different payloads and
63+
keys. Because of this, each record is individually parsed and written.
64+
65+
### Collection name
66+
67+
If a record contains an `opencdc.collection` property in its metadata it will be
68+
written in that collection, otherwise it will fall back to use the `collection`
69+
configured in the connector. Thus, a Destination can support multiple
70+
collections in the same connector, as long as the user has proper access to
71+
those collections.
72+
73+
### Key handling
74+
75+
The connector uses all keys from an `opencdc.Record` when updating and deleting
76+
documents.
77+
78+
If the `_id` field can be converted to a `bson.ObjectID`, the connector converts
79+
it, otherwise, it uses it as it is.<!-- /readmegen:description -->
3080

3181
### Source Configuration Parameters
3282

@@ -248,4 +298,16 @@ pipelines:
248298
```
249299
<!-- /readmegen:destination.parameters.yaml -->
250300
301+
### How to build it
302+
303+
Run `make build`.
304+
305+
### Development
306+
307+
Run `make install-tools` to install all the required tools.
308+
309+
Run `make test` to run all the units and `make test-integration` to run all the
310+
integration tests, which require Docker to be installed and running. The command
311+
will handle starting and stopping docker container for you.
312+
251313
![scarf pixel](https://static.scarf.sh/a.png?x-pxid=528a9760-d573-4524-8f65-74a5e4d402e8)

source/iterator/cdc.go

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,12 @@ const (
3939
var changeStreamMatchPipeline = bson.D{
4040
{
4141
Key: "$match", Value: bson.M{
42-
"operationType": bson.M{"$in": []string{
43-
operationTypeInsert,
44-
operationTypeUpdate,
45-
operationTypeDelete,
46-
}},
47-
},
42+
"operationType": bson.M{"$in": []string{
43+
operationTypeInsert,
44+
operationTypeUpdate,
45+
operationTypeDelete,
46+
}},
47+
},
4848
},
4949
}
5050

0 commit comments

Comments
 (0)