Skip to content

This project is intended to be used as a data extractor to support ELT pipelines or any kind of process that requires a heavy data dump from MongoDb databases.

Notifications You must be signed in to change notification settings

farovictor/MongoDbExtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Usage

This tool can be used as a package or cli tool. It serves as a data extractor to support ELT pipelines or any kind of process that requires a heavy data dump.

CLI

This package allows the user to dump data into multiple json files.

Ping database

The ping command does a ping in database and returns a connection check.

mongoextract ping --conn-uri "$MONGO_CONN_URI"

Check if a collection exists

The collxst command does a ping in database and returns a connection check.

mongoextract collxst \
	--conn-uri "$MONGO_CONN_URI" \
	--db-name "$MONGO_DBNAME" \
	--collection "$MONGO_COLLECTION" \
	--app-name "$APPNAME"

Extract in batches - dumping streaming (async)

The extract-batch command iterates over mongo cursor and dumps chunks of data into json files.

mongoextract extract-batch \
		--conn-uri "$MONGO_CONN_URI" \
		--db-name "$MONGO_DBNAME" \
		--collection "$MONGO_COLLECTION" \
		--app-name "$APPNAME" \
		--mapping $ID_NAME \
		--query '{"latitude":{"$$gte":30}}' \
		--output-path "./data" \
		--output-prefix $ID_NAME \
		--chunk-size 100 \
		--num-concurrent-files 10

Extract data

The extract command fetches a mongo cursor and dumps the whole data into a json file.

mongoextract extract \
	--conn-uri "$MONGO_CONN_URI" \
	--db-name "$MONGO_DBNAME" \
	--collection "$MONGO_COLLECTION" \
	--app-name "$APPNAME" \
	--mapping some_mapping_name \
	--query '{"latitude":{"$$gte":30}}'

Releases

All releases are documented in CHANGELOG, please check there for more details.

Docker

All releases are containerized and available here To pull the latest version, execute the following command:

docker pull victorfaro/mongoextract:latest

All images uses the cli as entrypoint, check the above examples to see how to use it.

About

This project is intended to be used as a data extractor to support ELT pipelines or any kind of process that requires a heavy data dump from MongoDb databases.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published