Skip to content
jmankoff edited this page Mar 14, 2017 · 3 revisions

ML Byte Instructions

This assignment is designed to help you get comfortable applying Machine Learning (ML) and Statistics related skills to classify data.

For this assignment you will use the 2015 American Community Survey data set, which contains detailed demographics for 3 million people living in the US. Your goal is to predict the sex of a person based on other demographics data. The learning goals for this assignment include:

  • Using libraries that can support statistics and machine learning rather than 'doing it yourself'
  • Using a statistical test to check for significant differences in things that might predict adoption
  • Using a visualization to sanity check the results you are finding
  • Developing a feature set to be used for classification
  • Following the best practices in developing a ML model
  • Using different classifiers and comparing their performance
  • Using cross validation to test your classifier
  • Understanding your results and comparing them to a baseline classifier
  • Iterating on your features to improve the results
  • Showing that your results are significantly better than baseline
  • Developing a report documenting your findings
  • Discussing the implications of the results and how ML can help us understand the data better

Setting up Jupyter on your home machine

You will do the majority of this assignment on your local machine using a Python notebook environment. To do so, you need to have the appropriate libraries installed. We provide instructions for setting up Anaconda on your machine

Clone this wiki locally