awsglue
Here are 33 public repositories matching this topic...
This project offers a robust data pipeline solution designed to efficiently extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. Leveraging a blend of industry-standard tools and services, the pipeline ensures seamless data processing and integration.
-
Updated
Jun 19, 2024 - Jupyter Notebook
This projects uses ETL (Extract, Transform and Load) pipeline to extract data from Spotify using its API and loads the data to a data source(AWS Athena). The entire pipeline will be built using Amazon Web Services (AWS).
-
Updated
Jul 8, 2023 - Jupyter Notebook
This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.
-
Updated
Aug 4, 2024 - Java
This project demonstrates how you can build downstream data pipeline using dbt in athena
-
Updated
Dec 24, 2022 - Python
In this project I have used the Trending YouTube Video Statistics data from Kaggle to analyze and prepare it for usage.
-
Updated
Nov 7, 2022
Projects on Big Data Using Pyspark and AWS
-
Updated
Apr 28, 2023 - Jupyter Notebook
This project showcases a data transformation pipeline utilizing AWS Glue and Amazon Athena to process Spotify data from CSV files. It involves loading, transforming, and storing data in an S3 datawarehouse, enabling seamless querying through Amazon Athena.
-
Updated
Mar 28, 2024 - Python
I am dedicated to delivering innovative solutions that align with business objectives while ensuring optimal performance, reliability, and security. My strong analytical skills, attention to detail, and problem-solving abilities drive me to create effective and efficient solutions.
-
Updated
Oct 11, 2024
Big data and Cloud Deployment
-
Updated
Jan 15, 2024 - Jupyter Notebook
This project builds a pipeline to analyze Superstore sales data using the power of AWS. It transforms the data to make it ready for exploration. Querying the transformed data using SQL queries to uncover trends and patterns. Analyzing results and creates easy-to-understand visualizations, providing clear insights into Superstore sales performance.
-
Updated
Apr 14, 2025
Incremental Data Load from S3 Bucket to Amazon Redshift Using AWS Glue
-
Updated
Aug 15, 2024 - Python
Transformed YouTube’s raw JSON data to parquet & loaded it in an S3 bucket, used Glue Data Catalog for storing metadata & Athena to query the cleaned data. Developed an ETL process using a Lambda job that would be triggered when raw data is loaded into an S3 bucket, processed, and stored for analytical purposes in an S3 bucket.
-
Updated
Feb 9, 2023 - Python
Data Engineering Specialization offered by Joe Reis in partnership with DeepLearning.AI through Coursera...
-
Updated
Sep 29, 2024 - Jupyter Notebook
This project demonstrates a comprehensive end-to-end data engineering pipeline for real-time stock market data processing. The system seamlessly integrates Apache Kafka with AWS cloud services to provide scalable, real-time analytics capabilities for financial data streams.
-
Updated
Jun 28, 2025 - Jupyter Notebook
ETL pipeline that transforms publicly available id304b covered entity (CE) data and loads to a relational database with change data capture. Uses AWS services to automate the pipeline.
-
Updated
Apr 1, 2025 - Python
The function for copying data like CSV, Parquet, avro etc., from a source S3 bucket to a destination S3 bucket using AWS Glue. It includes the necessary setup for the Glue job, logging, reading data from the source bucket, and writing it to the destination bucket
-
Updated
Apr 20, 2025 - Python
AWS S3 & Sentiment Analysis, Basic Plotting with Matplotlib, & Supervised Learning & Machine Learning with Sklearn.
-
Updated
Jul 6, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the awsglue topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the awsglue topic, visit your repo's landing page and select "manage topics."