This project is designed to load and query data from JSON files into MongoDB collections, analyzing the impact of different document models (normalized vs. embedded) and indexing on query performance. The system is composed of four main scripts:
- task1_build.py: Builds a normalized document store with separate collections for messages and senders.
- task2_build.py: Builds an embedded document store with sender information embedded within each message document.
- task1_query.py: Executes and analyzes queries on the normalized document store.
- task2_query.py: Executes and analyzes queries on the embedded document store.
- /src/:
task1_build.py
task2_build.py
task1_query.py
task2_query.py
- /resources/
messages_senders_JSON_zip.7z
README.md
- Python 3.x
- MongoDB Server
- MongoDB Client (
mongosh
) messages.json
andsenders.json
data files (extracted from/resources/messages_senders_JSON_zip.7z
)
-
Download Files:
- Clone the repository and navigate to the
/src/
and/resources/
directories for required files./src/
contains the Python scripts:task1_build.py
task2_build.py
task1_query.py
task2_query.py
/resources/
contains the compressed JSON files:messages_senders_JSON_zip.7z
(which containsmessages.json
andsenders.json
)
- Extract the contents of
messages_senders_JSON_zip.7z
to obtainmessages.json
andsenders.json
.
- Clone the repository and navigate to the
-
Initiate MongoDB Server:
- Create a data directory:
mkdir ~/mongodb_data_folder
- Start the MongoDB server:
mongod --port <portNumber> --dbpath ~/mongodb_data_folder &
- Create a data directory:
- Build the Normalized Store:
Run the following command in the
/src/
folder:python3 task1_build.py <portNumber>
- Query the Normalized Store:
After building the normalized store, execute the query script:
python3 task1_query.py <portNumber>
- Build the Embedded Store:
Run the following command in the
/src/
folder:python3 task2_build.py <portNumber>
- Query the Embedded Store:
After building the embedded store, execute the query script:
python3 task2_query.py <portNumber>
You can explore the MongoDB database using the MongoDB client:
- Start the MongoDB client:
mongosh --port <portNumber>
- Use the following commands to navigate and interact with the database:
- Open database:
use DATABASE_NAME
- List all databases:
show dbs
- Drop a database:
db.dropDatabase()
- Create a collection:
db.createCollection("COLLECTION_NAME")
- Drop a collection:
db.COLLECTION_NAME.drop()
- Query a collection:
db.COLLECTION_NAME.find()
- Open database:
This project demonstrates a comparison between normalized and embedded MongoDB document stores, highlighting the impact of document models and indexing on query performance. By examining the results from both approaches, this system provides valuable insights for designing MongoDB schemas for real-world applications.