batch ETL approach for the solution.
- Install docker
- clone the project
- open terminal
- run the follwing command in the cloned folder directory
docker-compose up
The docker-compose command basically lays the whole docker based infrastructure required for the program to be executed:
For this approach the following components are created by the docker-compose
:
- Python based docker image
- Postgres DB
- First, the docker spawns the containers for the python based and postgres DB (also creating the data tables required for the ETL).
- Then the python script gets executed, it invokes the twitter API and extracts the data from the get request.
- The batch is configured to get the weekly extract of tweets starting from the scheduled time.
- Then the extracted data after application of some transformations gets ingested in the relevant PostgresDB's Tables.
- This completes the ETL.
Execute the following commands on the terminal
docker container ps
view the DB containers CONTAINER ID and copy it
docker exec -it <copied CONTAINER ID> bash
this will route your terminal to the postgres DB instance
psql postgres://username:secret@localhost:5432/database
this command will connect to your Database
Please find attached the design diagram for the solution.