Skip to content


Repository files navigation

Final Capstone project


In order to complete IBM's Professional Data Science Certificate, I completed a capstone project using what I had learned from the previous courses. In this project, I analyze historical SpaceX rocket data in order to make accurate predictions about future launch attempts. This project is split up into 8 distinct Jupyter notebooks, which I will briefly describe below.

Data Collection (API)

In this module, I performed the following tasks:

  1. Utilized SpaceX's API to collect data through GET requests
  2. Cleaned / parsed the data using pandas, numpy

Exploratory Data Analysis

In this module, I performed the following tasks:

  1. Used pandas, numpy to perform EDA
  2. Understand important metrics, patterns from my imported data


In this module, I performed the following tasks:

  1. Extracted launch data (HTML tables) from Wikipedia
  2. Parse table, convert to Pandas dataframe
  3. Use HTTP get requests

EDA with Visualization

In this module, I performed the following tasks:

  1. Use Matplotlib and Seaborn to visualize / compare different features of my dataset
  2. Perform feature engineering and create dummy variables

EDA with SQL

In this module, I performed the following tasks:

  1. Utilized / connected to IBM's DB2 database to perform SQL queries
  2. Used SQL to perform further EDA

Interactive Visuals with Folium

In this module, I performed the following tasks:

  1. Used Folium to mark launch sites on map of USA
  2. Mark successful vs failed launches on map
  3. Create clusters using Marker Cluster
  4. Calculate distances from site to proximities

Plotly Dashboard

In this module, I performed the following tasks:

  1. Used JupyterDash to create a Dash application
  2. Designed app using HTML
  3. Utilized callback functionality within App
  4. Ran app on external server

Machine Learning Pipeline

In this module, I performed the following tasks:

  1. Used sklearn preprocessing module to standardize my data
  2. Split data into Train/Test split
  3. Applied 3 different classification algorithms on data to predict future launch outcomes: Logistic Regression, SVM, Decision Tree, KNN
  4. Used Grid Search to find best parameters, mapped outcome to confusion matrix and analyzed best model


  1. Python3
  2. Pandas
  3. Numpy
  4. BeautifulSoup
  5. Matplotlib
  6. Seaborn
  7. SQL
  8. Folium
  9. Dash
  10. HTML
  11. Sklearn


Analyzing historical SpaceX data (IBM)






No releases published


No packages published