Skip to content

munyingi/data-science-portfolio

Repository files navigation

Data Science Portfolio

Author: Samwel Munyingi
Last Updated: November 2025


Introduction

Welcome to my comprehensive data science portfolio. This repository showcases my expertise across multiple domains of data science, including machine learning, deep learning, natural language processing, time series analysis, and business intelligence. Each project demonstrates end-to-end implementation, from data collection and preprocessing to model development, evaluation, and deployment.

These projects highlight my ability to solve real-world business problems using data-driven approaches, with a focus on delivering actionable insights and measurable impact.


Projects Overview

1. Customer Churn Prediction System

Domain: Machine Learning | Predictive Analytics
Technologies: Python, Scikit-learn, Pandas, Streamlit

A comprehensive churn prediction system for telecommunications companies. This project includes exploratory data analysis, feature engineering, model development (Logistic Regression, Random Forest, Gradient Boosting), and an interactive Streamlit dashboard for real-time predictions.

Key Results:

  • 82% accuracy in predicting customer churn
  • AUC-ROC of 0.85, demonstrating strong model performance
  • Projected $500K+ annual savings through targeted retention

View Project →


2. Real-Time Sales Analytics Dashboard

Domain: Business Intelligence | Data Visualization
Technologies: Python, Pandas, Plotly, Matplotlib

An interactive sales analytics dashboard providing comprehensive insights into sales performance, profitability, and customer behavior. The dashboard visualizes KPIs, trends, and regional performance to support data-driven decision-making.

Key Results:

  • Identified Technology as the top revenue-generating category
  • Revealed Q4 seasonal peaks for strategic planning
  • Highlighted West region as the top performer

View Project →


3. Sentiment Analysis & Social Media Monitoring

Domain: Natural Language Processing | Text Analytics
Technologies: Python, Scikit-learn, NLTK, TF-IDF

An NLP-based sentiment analysis system that classifies product reviews as positive, negative, or neutral. The project demonstrates text preprocessing, feature extraction using TF-IDF, and model comparison across multiple algorithms.

Key Results:

  • 100% accuracy on synthetic test data
  • Identified key sentiment-driving terms
  • Enabled real-time brand perception monitoring

View Project →


4. Image Classification with Deep Learning

Domain: Computer Vision | Deep Learning
Technologies: Python, TensorFlow, Keras, CNN

A convolutional neural network (CNN) for classifying fashion items from the Fashion MNIST dataset. The project showcases expertise in building custom deep learning architectures, training optimization, and model evaluation.

Key Results:

  • 85.1% test accuracy on Fashion MNIST
  • 99.5% accuracy on best-performing class (Sneaker)
  • Robust CNN architecture with batch normalization and dropout

View Project →


5. Stock Price Forecasting with Time Series Analysis

Domain: Time Series Analysis | Financial Analytics
Technologies: Python, Pandas, NumPy, Matplotlib

A time series forecasting system for predicting stock prices. The project analyzes historical price data, identifies trends and seasonality, and builds a forecasting model using moving averages.

Key Results:

  • 6.33% MAPE (Mean Absolute Percentage Error)
  • Clear identification of seasonal patterns
  • Actionable insights for trading strategies

View Project →


6. Healthcare Analytics - Disease Prediction

Domain: Healthcare Analytics | Predictive Modeling
Technologies: Python, Scikit-learn, Pandas, Seaborn

A machine learning system for predicting diabetes risk based on patient health metrics. The project demonstrates the application of ML in healthcare, feature importance analysis, and clinical decision support.

Key Results:

  • 87.5% accuracy in diabetes risk prediction
  • AUC-ROC of 0.94, indicating excellent discrimination
  • Identified glucose level as the strongest predictor

View Project →


Technical Skills Demonstrated

Programming & Tools

  • Languages: Python
  • Libraries: Scikit-learn, TensorFlow, Keras, Pandas, NumPy, Matplotlib, Seaborn, Plotly
  • Frameworks: Streamlit
  • Version Control: Git, GitHub

Data Science Techniques

  • Machine Learning: Classification, Regression, Ensemble Methods
  • Deep Learning: Convolutional Neural Networks (CNNs)
  • Natural Language Processing: Text Preprocessing, TF-IDF, Sentiment Analysis
  • Time Series Analysis: Trend Analysis, Seasonality, Forecasting
  • Data Visualization: Interactive Dashboards, Statistical Plots

Business Applications

  • Customer Analytics & Retention
  • Sales & Revenue Optimization
  • Brand Monitoring & Sentiment Analysis
  • Financial Forecasting
  • Healthcare Risk Prediction

How to Use This Portfolio

Each project folder contains:

  1. README.md - Detailed project documentation
  2. src/ - Source code and scripts
  3. data/ - Sample datasets (or instructions to download)
  4. notebooks/ - Jupyter notebooks with analysis
  5. visualizations/ - Generated charts and graphs
  6. screenshots/ - Dashboard and application screenshots

To run any project, navigate to its directory and follow the instructions in the project-specific README.


Contact & Collaboration

I am actively seeking opportunities in data science, machine learning engineering, and analytics roles. If you're interested in collaborating or discussing these projects, please feel free to reach out.

Portfolio Website: samwelmunyingi.com
GitHub: github.com/samwelmunyingi
Email: [email protected]


License

This portfolio is for demonstration purposes. Individual datasets may have their own licenses. Please refer to each project's README for specific details.


Thank you for reviewing my portfolio!

About

Some of my data science portfolio showcasing ML, DL, NLP, and analytics projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published