Home

ML Byte Instructions

This assignment is designed to help you get comfortable applying Machine Learning (ML) and Statistics related skills to classify data.

For this assignment you will use the 2015 American Community Survey data set, which contains detailed demographics for 3 million people living in the US. Your goal is to predict the sex of a person based on other demographics data. The learning goals for this assignment include:

Using libraries that can support statistics and machine learning rather than 'doing it yourself'
Using a statistical test to check for significant differences in things that might predict adoption
Using a visualization to sanity check the results you are finding
Developing a feature set to be used for classification
Following the best practices in developing a ML model
Using different classifiers and comparing their performance
Using cross validation to test your classifier
Understanding your results and comparing them to a baseline classifier
Iterating on your features to improve the results
Showing that your results are significantly better than baseline
Developing a report documenting your findings
Discussing the implications of the results and how ML can help us understand the data better

Setting up Jupyter on your home machine

You will do the majority of this assignment on your local machine using a Python notebook environment. To do so, you need to have the appropriate libraries installed. We provide instructions for setting up Anaconda on your machine

(c) Interactive Data Pipeline, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

ML Byte Instructions

Setting up Jupyter on your home machine

Uh oh!

Uh oh!

Clone this wiki locally