from nrcan_etl_toolbox.etl_toolbox.reader.source_readers import ExcelReader
Pour la version française de ce document, consultez README-fr.md.
etl-toolbox is a Python toolkit designed to simplify Extract, Transform, and Load (ETL) data processes. This modular toolkit offers several specialized components for different aspects of ETL workflows.
Specialized logging module for ETL processes, allowing simple configuration and efficient log analysis.
Collection of tools for reading data from various sources. It includes readers for different file formats and databases, facilitating data integration in ETL processes:
- Data Readers: CSV, Excel, GeoPackage, JSON, PostGIS, Shapefile
Interfaces and ORM for interacting with different database systems:
- Database Interfaces: Abstract object handlers for database interactions
- ORM: Object-relational mappings to simplify data access
Install the package via Poetry:
poetry installOr by creating a distribution:
poetry build
pip install dist/nrcan_etl_toolbox-*.whlfrom nrcan_etl_toolbox.etl_logging import CustomLogger
logger = CustomLogger(name="Test Logger", level='INFO'
,logger_type='verbose',
logger_file_name='test_logger.log')
# Logging messages
logger.info("Starting ETL process")
logger.debug("Technical details", extra={"data": {"items": 100}})
logger.error("Processing error", exc_info=True)from nrcan_etl_toolbox.etl_toolbox.reader import ReaderFactory
from nrcan_etl_toolbox.etl_toolbox.reader.source_readers import ExcelReader
# Creating a CSV reader
csv_reader = ReaderFactory(input_source="data.csv")
data = csv_reader.data
# Creating a Shapefile reader
shp_reader = ReaderFactory(input_source="data.shp")
geo_data = shp_reader.data
# Creating a PostGIS reader
postgis_reader = ReaderFactory(input_source="postgresql://user:password@host:port/database", # Use the connection string for your database
table_name="table_name",
schema="schema_name")
geo_data = postgis_reader.data
# Creating an Excel reader
reader = ReaderFactory(input_source="data.xlsx")
# Get the Reader object
excel_reader : ExcelReader = reader.reader
# If excel file contains multiple sheets,
# data will be a dictionary with sheet names as keys and dataframes as values
data = excel_reader.dataframe
# data = {'Sheet1': df1, 'Sheet2': df2}
# To read a specific sheet, use the sheet_name parameter
data = excel_reader.read_sheet('Sheet1')
# data = df1# TODO: Complete documentation.
from nrcan_etl_toolbox.database.interface import AbstractDatabaseHandler
# Usage example to be documentedTo contribute to the project, install development dependencies:
poetry install --with devRun tests with:
pytestnrcan_etl_toolbox/
├── database/ # Database interactions
│ ├── interface/ # Abstract interfaces for databases
│ └── orm/ # Object-relational mappings
├── etl_logging/ # ETL logging module
└── etl_toolbox/ # Main ETL tools
└── reader/ # Data source readers
└── source_readers/ # Specific reader implementations
- NRCAN (Natural Resources Canada)
- Xavier Malet
For questions or suggestions, please use the project's GitHub issues.