You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-36
Original file line number
Diff line number
Diff line change
@@ -1,21 +1,8 @@
1
1
## Tutorial: Machine Learning with Text in scikit-learn
2
2
3
-
Presented by [Kevin Markham](http://www.dataschool.io/about/) at PyCon 2016 (Portland, Oregon)
3
+
Presented by [Kevin Markham](http://www.dataschool.io/about/) at PyCon on May 28, 2016. Watch the complete [tutorial video](https://www.youtube.com/watch?v=ZiKMIuYidY0&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10) on YouTube.
4
4
5
-
### Files
6
-
7
-
* Tutorial: [notebook](tutorial.ipynb), [notebook with output](tutorial_with_output.ipynb), [script](tutorial.py), [SMS dataset](data/sms.tsv)
8
-
* Exercise: [notebook](exercise.ipynb), [notebook with solution](exercise_solution.ipynb), [script](exercise.py), [script with solution](exercise_solution.py), [Yelp dataset](data/yelp.csv)
9
-
10
-
### Welcome!
11
-
12
-
This repository contains the data files and the notebooks/scripts that you will need for the tutorial.
13
-
14
-
A detailed description of the tutorial is below, including a list of **required software** and **knowledge prerequisites**. If you need a refresher on any of the prerequisite material, I have listed my recommended resources.
15
-
16
-
Due to slow Internet connections at the conference, you should plan to download this repository and install the required software **before arriving at the conference**.
17
-
18
-
I look forward to meeting you on **Saturday, May 28 at 9:00am**! Please email me at [[email protected]](mailto:[email protected]) if you have any questions at all.
5
+
[](https://www.youtube.com/watch?v=ZiKMIuYidY0&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10"Machine Learning with Text in scikit-learn - PyCon 2016")
19
6
20
7
### Description
21
8
@@ -31,6 +18,12 @@ Attendees will need to bring a laptop with [scikit-learn](http://scikit-learn.or
31
18
32
19
I will be leading the tutorial using the IPython/Jupyter notebook, and have added a pre-written notebook to this repository. I have also created a Python script that is identical to the notebook, which you can use in the Python environment of your choice.
Attendees to this tutorial should be comfortable working in Python, should understand the basic principles of machine learning, and should have at least basic experience with both pandas and scikit-learn. However, no knowledge of advanced mathematics is required.
@@ -60,27 +53,10 @@ In this tutorial, we'll answer all of those questions, and more! We'll start by
60
53
61
54
Kevin Markham is the founder of [Data School](http://www.dataschool.io/) and the former lead instructor for [General Assembly's Data Science course](https://github.com/justmarkham/DAT8) in Washington, DC. He is passionate about teaching data science to people who are new to the field, regardless of their educational and professional backgrounds, and he enjoys teaching both online and in the classroom. Kevin's professional focus is supervised machine learning, which led him to create the popular [scikit-learn video series](https://github.com/justmarkham/scikit-learn-videos) for Kaggle. He has a degree in Computer Engineering from Vanderbilt University.
62
55
63
-
### Tutorial Introduction
64
-
65
-
* Required files for today:
66
-
* Clone or download this repository: [http://bit.ly/pycon2016](http://bit.ly/pycon2016)
67
-
* IPython/Jupyter notebooks ([tutorial.ipynb](tutorial.ipynb), [exercise.ipynb](exercise.ipynb)) or Python scripts ([tutorial.py](tutorial.py), [exercise.py](exercise.py))
68
-
* Datasets in the `data` subdirectory ([sms.tsv](data/sms.tsv), [yelp.csv](data/yelp.csv))
69
-
* Required software for today:
70
-
*[scikit-learn](http://scikit-learn.org/stable/install.html) and [pandas](http://pandas.pydata.org/pandas-docs/stable/install.html) (and their dependencies)
71
-
*[Anaconda distribution of Python](https://www.continuum.io/downloads) is an easy way to install both of these
72
-
* Both Python 2 and 3 are welcome
73
-
* Flash drives are available with Anaconda installers and tutorial files
74
-
* About me:
75
-
* Founder of Data School: [blog](http://www.dataschool.io/), [YouTube](https://youtube.com/user/dataschool)
* Read Paul Graham's classic post, [A Plan for Spam](http://www.paulgraham.com/spam.html), for an overview of a basic text classification system using a Bayesian approach. (He also wrote a [follow-up post](http://www.paulgraham.com/better.html) about how he improved his spam filter.)
0 commit comments