Author: Samwel Munyingi
Last Updated: November 2025
Welcome to my comprehensive data science portfolio. This repository showcases my expertise across multiple domains of data science, including machine learning, deep learning, natural language processing, time series analysis, and business intelligence. Each project demonstrates end-to-end implementation, from data collection and preprocessing to model development, evaluation, and deployment.
These projects highlight my ability to solve real-world business problems using data-driven approaches, with a focus on delivering actionable insights and measurable impact.
Domain: Machine Learning | Predictive Analytics
Technologies: Python, Scikit-learn, Pandas, Streamlit
A comprehensive churn prediction system for telecommunications companies. This project includes exploratory data analysis, feature engineering, model development (Logistic Regression, Random Forest, Gradient Boosting), and an interactive Streamlit dashboard for real-time predictions.
Key Results:
- 82% accuracy in predicting customer churn
- AUC-ROC of 0.85, demonstrating strong model performance
- Projected $500K+ annual savings through targeted retention
Domain: Business Intelligence | Data Visualization
Technologies: Python, Pandas, Plotly, Matplotlib
An interactive sales analytics dashboard providing comprehensive insights into sales performance, profitability, and customer behavior. The dashboard visualizes KPIs, trends, and regional performance to support data-driven decision-making.
Key Results:
- Identified Technology as the top revenue-generating category
- Revealed Q4 seasonal peaks for strategic planning
- Highlighted West region as the top performer
Domain: Natural Language Processing | Text Analytics
Technologies: Python, Scikit-learn, NLTK, TF-IDF
An NLP-based sentiment analysis system that classifies product reviews as positive, negative, or neutral. The project demonstrates text preprocessing, feature extraction using TF-IDF, and model comparison across multiple algorithms.
Key Results:
- 100% accuracy on synthetic test data
- Identified key sentiment-driving terms
- Enabled real-time brand perception monitoring
Domain: Computer Vision | Deep Learning
Technologies: Python, TensorFlow, Keras, CNN
A convolutional neural network (CNN) for classifying fashion items from the Fashion MNIST dataset. The project showcases expertise in building custom deep learning architectures, training optimization, and model evaluation.
Key Results:
- 85.1% test accuracy on Fashion MNIST
- 99.5% accuracy on best-performing class (Sneaker)
- Robust CNN architecture with batch normalization and dropout
Domain: Time Series Analysis | Financial Analytics
Technologies: Python, Pandas, NumPy, Matplotlib
A time series forecasting system for predicting stock prices. The project analyzes historical price data, identifies trends and seasonality, and builds a forecasting model using moving averages.
Key Results:
- 6.33% MAPE (Mean Absolute Percentage Error)
- Clear identification of seasonal patterns
- Actionable insights for trading strategies
Domain: Healthcare Analytics | Predictive Modeling
Technologies: Python, Scikit-learn, Pandas, Seaborn
A machine learning system for predicting diabetes risk based on patient health metrics. The project demonstrates the application of ML in healthcare, feature importance analysis, and clinical decision support.
Key Results:
- 87.5% accuracy in diabetes risk prediction
- AUC-ROC of 0.94, indicating excellent discrimination
- Identified glucose level as the strongest predictor
- Languages: Python
- Libraries: Scikit-learn, TensorFlow, Keras, Pandas, NumPy, Matplotlib, Seaborn, Plotly
- Frameworks: Streamlit
- Version Control: Git, GitHub
- Machine Learning: Classification, Regression, Ensemble Methods
- Deep Learning: Convolutional Neural Networks (CNNs)
- Natural Language Processing: Text Preprocessing, TF-IDF, Sentiment Analysis
- Time Series Analysis: Trend Analysis, Seasonality, Forecasting
- Data Visualization: Interactive Dashboards, Statistical Plots
- Customer Analytics & Retention
- Sales & Revenue Optimization
- Brand Monitoring & Sentiment Analysis
- Financial Forecasting
- Healthcare Risk Prediction
Each project folder contains:
- README.md - Detailed project documentation
- src/ - Source code and scripts
- data/ - Sample datasets (or instructions to download)
- notebooks/ - Jupyter notebooks with analysis
- visualizations/ - Generated charts and graphs
- screenshots/ - Dashboard and application screenshots
To run any project, navigate to its directory and follow the instructions in the project-specific README.
I am actively seeking opportunities in data science, machine learning engineering, and analytics roles. If you're interested in collaborating or discussing these projects, please feel free to reach out.
Portfolio Website: samwelmunyingi.com
GitHub: github.com/samwelmunyingi
Email: [email protected]
This portfolio is for demonstration purposes. Individual datasets may have their own licenses. Please refer to each project's README for specific details.
Thank you for reviewing my portfolio!