|
12 | 12 | "\n",
|
13 | 13 | "Now for the real work – writing the code that will perform out analysis.\n",
|
14 | 14 | "\n",
|
15 |
| - "We want to do the following:\n", |
| 15 | + "Imagine we want to do the following:\n", |
16 | 16 | "- Create a Jupyter notebook for exploratory analysis\n",
|
17 | 17 | "- Generate the following outputs using python scripts:\n",
|
18 | 18 | " - Generate a subset of `winemag-130k-v2.csv` containing only the following columns: `country, designation, points, price (in GBP)`. Save in a .csv file\n",
|
|
32 | 32 | "You should now have a directory called `SupportScripts`.\n",
|
33 | 33 | "\n",
|
34 | 34 | "You need to make sure that all scripts from the directory are in the appropriate directory inside your newly created project.\n",
|
35 |
| - "- Noteboks\n", |
36 |
| - "- src/data\n", |
37 |
| - "- src/visualization\n", |
| 35 | + "- Noteboks -> move the Jupyter notebooks for the exploratory data analysis\n", |
| 36 | + "- src/data -> move the raw data here\n", |
| 37 | + "- src/visualization (this should be left empty)\n", |
38 | 38 | "\n",
|
39 |
| - "Once this is done commit your changes to git\n", |
| 39 | + "Once this is done commit your changes to git.\n", |
40 | 40 | "```bash\n",
|
41 | 41 | "$ git add .\n",
|
42 | 42 | "$ git commit -m \"Add processing scripts\"\n",
|
|
51 | 51 | }
|
52 | 52 | },
|
53 | 53 | "source": [
|
54 |
| - "Let's face it.... there are going to be files\n", |
55 |
| - "**LOTS** of files\n", |
| 54 | + "Now we are ready to start programming, analysing data, and testing. But let's face it.... there are going to be files\n", |
| 55 | + "**LOTS** of files. There are always lots of files generated along the way.\n", |
56 | 56 | "\n",
|
57 | 57 | ""
|
58 | 58 | ]
|
|
70 | 70 | "The three principles for (file) names:\n",
|
71 | 71 | "- **Machine readable **: regex and globbing friendly, deliberate use of delimiters *\n",
|
72 | 72 | "- **Human readable**: contains info on content, connects to concept of slug from semantic URLs\n",
|
73 |
| - "- **Plays well with default ordering**: put something numeric first, use ISO 8601 for dates **YYYY-MM-DD**\n", |
| 73 | + "- **Plays well with default ordering**: put something numeric first e.g. `01_data-cleaning.py`, use ISO 8601 for dates **YYYY-MM-DD**\n", |
74 | 74 | "\n",
|
75 |
| - "<small>* Avoid spaced, accented characters, files 'foo' and 'Foo' </small>" |
| 75 | + "<small>* Avoid spaced, accented characters, files like 'foo' and 'Foo' </small>" |
76 | 76 | ]
|
77 | 77 | },
|
78 | 78 | {
|
|
144 | 144 | "- **00_explore-data.ipynb**: exploratory analysis \n",
|
145 | 145 | "- **01_subset-data-GBP.py**: subset of winemag-130k-v2.csv containing only the following columns: country, designation, points, price (in GBP). Save in a .csv file\n",
|
146 | 146 | "- **02_visualize-wines.py**\n",
|
147 |
| - "- **03_country-subset.py**\n", |
| 147 | + "- **03_country-subset.py**" |
| 148 | + ] |
| 149 | + }, |
| 150 | + { |
| 151 | + "cell_type": "markdown", |
| 152 | + "metadata": {}, |
| 153 | + "source": [ |
| 154 | + "First things first. We need to get a sense of what the data looks like and create some additional metadata for it.\n", |
148 | 155 | "\n",
|
| 156 | + "Open the `00_Explore-data.ipynb` notebook and run the cells." |
| 157 | + ] |
| 158 | + }, |
| 159 | + { |
| 160 | + "cell_type": "markdown", |
| 161 | + "metadata": { |
| 162 | + "slideshow": { |
| 163 | + "slide_type": "slide" |
| 164 | + } |
| 165 | + }, |
| 166 | + "source": [ |
149 | 167 | "From the root of your file system you can run the scripts as follow (you might have to change `2018-05-09` to the current date):\n",
|
150 | 168 | "```\n",
|
151 | 169 | "$ python src/data/01_subset-data-GBP.py data/raw/winemag-data-130k-v2.csv \n",
|
152 | 170 | "$ python src/visualization/02_visualize-wines.py data/interim/2018-05-09-winemag_priceGBP.csv \n",
|
153 | 171 | "$ python src/data/03_country-subset.py data/interim/2018-05-09-winemag_priceGBP.csv Chile\n",
|
154 | 172 | "```\n",
|
155 | 173 | "\n",
|
156 |
| - "😕 What problems did you encounter? \n" |
| 174 | + "😕 What problems did you encounter? " |
157 | 175 | ]
|
158 | 176 | },
|
159 | 177 | {
|
|
0 commit comments