Skip to content

johnny-kantaros/capstone-project

Repository files navigation

Final Capstone project

Introduction

In order to complete IBM's Professional Data Science Certificate, I completed a capstone project using what I had learned from the previous courses. In this project, I analyze historical SpaceX rocket data in order to make accurate predictions about future launch attempts. This project is split up into 8 distinct Jupyter notebooks, which I will briefly describe below.

Data Collection (API)

In this module, I performed the following tasks:

  1. Utilized SpaceX's API to collect data through GET requests
  2. Cleaned / parsed the data using pandas, numpy

Exploratory Data Analysis

In this module, I performed the following tasks:

  1. Used pandas, numpy to perform EDA
  2. Understand important metrics, patterns from my imported data

Webscraping

In this module, I performed the following tasks:

  1. Extracted launch data (HTML tables) from Wikipedia
  2. Parse table, convert to Pandas dataframe
  3. Use HTTP get requests

EDA with Visualization

In this module, I performed the following tasks:

  1. Use Matplotlib and Seaborn to visualize / compare different features of my dataset
  2. Perform feature engineering and create dummy variables

EDA with SQL

In this module, I performed the following tasks:

  1. Utilized / connected to IBM's DB2 database to perform SQL queries
  2. Used SQL to perform further EDA

Interactive Visuals with Folium

In this module, I performed the following tasks:

  1. Used Folium to mark launch sites on map of USA
  2. Mark successful vs failed launches on map
  3. Create clusters using Marker Cluster
  4. Calculate distances from site to proximities

Plotly Dashboard

In this module, I performed the following tasks:

  1. Used JupyterDash to create a Dash application
  2. Designed app using HTML
  3. Utilized callback functionality within App
  4. Ran app on external server

Machine Learning Pipeline

In this module, I performed the following tasks:

  1. Used sklearn preprocessing module to standardize my data
  2. Split data into Train/Test split
  3. Applied 3 different classification algorithms on data to predict future launch outcomes: Logistic Regression, SVM, Decision Tree, KNN
  4. Used Grid Search to find best parameters, mapped outcome to confusion matrix and analyzed best model

Technologies

  1. Python3
  2. Pandas
  3. Numpy
  4. BeautifulSoup
  5. Matplotlib
  6. Seaborn
  7. SQL
  8. Folium
  9. Dash
  10. HTML
  11. Sklearn

About

Analyzing historical SpaceX data (IBM)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published