Skip to content

Commit ae313c1

Browse files
committed
Add details about schema and envs
1 parent 72fd856 commit ae313c1

File tree

5 files changed

+64
-26
lines changed

5 files changed

+64
-26
lines changed

00_Setup.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -295,8 +295,8 @@
295295
"cell_type": "markdown",
296296
"metadata": {},
297297
"source": [
298-
"## 📊 Getting the data\n",
299-
"We will be using some data sets, make sure to download them before the session using the following link [https://drive.google.com/drive/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing](https://drive.google.com/drive/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing)"
298+
"## 📊 Getting the data and some support scripts\n",
299+
"We will be using some data sets as well as some support scripts, make sure to download them before the session using the following link [https://drive.google.com/drive/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing](https://drive.google.com/drive/folders/1b2B0KWS0UAVQqFgzx2R2qMNeiiB98lMe?usp=sharing)"
300300
]
301301
},
302302
{

01_ProjectStructure.ipynb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,9 @@
3434
"``` bash\n",
3535
"$ source activate reproPython\n",
3636
"```\n",
37+
" \n",
38+
"You will notice that your terminal displays the activated conda environment:\n",
39+
"![terminal](assets/terminal_source.png)\n",
3740
"\n",
3841
"Now we can create the project structure, again from the shell:\n",
3942
"```bash\n",

02_WorkingWithData.ipynb

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -115,21 +115,38 @@
115115
"</table>\n",
116116
"\n",
117117
"<table>\n",
118+
"<tr>\n",
119+
" <th>Where</th>\n",
120+
" <th>Such as..</th>\n",
121+
"</tr>\n",
122+
"<tr>\n",
123+
"<td>Where was the data collected?</td>\n",
124+
"<td>France, Rhones Alpes</td>\n",
125+
"</tr>\n",
126+
"<tr>\n",
127+
"<td>Where does the data live?</td>\n",
128+
"<td>Theurel A, Gentaz E (2018) Data from: The regulation of emotions\n",
129+
" in adolescents: age differences and emotion-specific patterns.\n",
130+
" Dryad Digital Repository. <a href=\"https://doi.org/10.5061/dryad.n230404\">\n",
131+
" https://doi.org/10.5061/dryad.n230404</a> </td>\n",
132+
"</tr>\n",
133+
"</table>\n",
134+
"\n",
135+
"<table>\n",
136+
" <tr>\n",
137+
" <th>Who</th>\n",
138+
" <th>Such as..</th>\n",
139+
" </tr>\n",
118140
" <tr>\n",
119-
" <th>Where</th>\n",
120-
" <th>Such as..</th>\n",
141+
" <td>Who is responsible for the data?</td>\n",
142+
" <td>Dr Theurel, Anne</td>\n",
143+
" </tr>\n",
144+
" <th>When</th>\n",
145+
" <th>Such as..</th>\n",
146+
" <tr>\n",
147+
" <td>Was the data collected? What time span does the data cover?\t</td>\n",
148+
" <td>Collected: June 2015. Data coverage: 1932-1944</td>\n",
121149
" </tr>\n",
122-
" <tr>\n",
123-
" <td>Where was the data collected?</td>\n",
124-
" <td>France, Rhones Alpes</td>\n",
125-
" </tr>\n",
126-
" <tr>\n",
127-
" <td>Where does the data live?</td>\n",
128-
" <td>Theurel A, Gentaz E (2018) Data from: The regulation of emotions\n",
129-
" in adolescents: age differences and emotion-specific patterns.\n",
130-
" Dryad Digital Repository. <a href=\"https://doi.org/10.5061/dryad.n230404\">\n",
131-
" https://doi.org/10.5061/dryad.n230404</a> </td>\n",
132-
" </tr>\n",
133150
"</table>"
134151
]
135152
},

03_ProcessData.ipynb

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
"\n",
1313
"Now for the real work &ndash; writing the code that will perform out analysis.\n",
1414
"\n",
15-
"We want to do the following:\n",
15+
"Imagine we want to do the following:\n",
1616
"- Create a Jupyter notebook for exploratory analysis\n",
1717
"- Generate the following outputs using python scripts:\n",
1818
" - Generate a subset of `winemag-130k-v2.csv` containing only the following columns: `country, designation, points, price (in GBP)`. Save in a .csv file\n",
@@ -32,11 +32,11 @@
3232
"You should now have a directory called `SupportScripts`.\n",
3333
"\n",
3434
"You need to make sure that all scripts from the directory are in the appropriate directory inside your newly created project.\n",
35-
"- Noteboks\n",
36-
"- src/data\n",
37-
"- src/visualization\n",
35+
"- Noteboks -> move the Jupyter notebooks for the exploratory data analysis\n",
36+
"- src/data -> move the raw data here\n",
37+
"- src/visualization (this should be left empty)\n",
3838
"\n",
39-
"Once this is done commit your changes to git\n",
39+
"Once this is done commit your changes to git.\n",
4040
"```bash\n",
4141
"$ git add .\n",
4242
"$ git commit -m \"Add processing scripts\"\n",
@@ -51,8 +51,8 @@
5151
}
5252
},
5353
"source": [
54-
"Let's face it.... there are going to be files\n",
55-
"**LOTS** of files\n",
54+
"Now we are ready to start programming, analysing data, and testing. But let's face it.... there are going to be files\n",
55+
"**LOTS** of files. There are always lots of files generated along the way.\n",
5656
"\n",
5757
"![files](assets/allthefiles.png)"
5858
]
@@ -70,9 +70,9 @@
7070
"The three principles for (file) names:\n",
7171
"- **Machine readable **: regex and globbing friendly, deliberate use of delimiters *\n",
7272
"- **Human readable**: contains info on content, connects to concept of slug from semantic URLs\n",
73-
"- **Plays well with default ordering**: put something numeric first, use ISO 8601 for dates **YYYY-MM-DD**\n",
73+
"- **Plays well with default ordering**: put something numeric first e.g. `01_data-cleaning.py`, use ISO 8601 for dates **YYYY-MM-DD**\n",
7474
"\n",
75-
"<small>* Avoid spaced, accented characters, files 'foo' and 'Foo' </small>"
75+
"<small>* Avoid spaced, accented characters, files like 'foo' and 'Foo' </small>"
7676
]
7777
},
7878
{
@@ -144,16 +144,34 @@
144144
"- **00_explore-data.ipynb**: exploratory analysis \n",
145145
"- **01_subset-data-GBP.py**: subset of winemag-130k-v2.csv containing only the following columns: country, designation, points, price (in GBP). Save in a .csv file\n",
146146
"- **02_visualize-wines.py**\n",
147-
"- **03_country-subset.py**\n",
147+
"- **03_country-subset.py**"
148+
]
149+
},
150+
{
151+
"cell_type": "markdown",
152+
"metadata": {},
153+
"source": [
154+
"First things first. We need to get a sense of what the data looks like and create some additional metadata for it.\n",
148155
"\n",
156+
"Open the `00_Explore-data.ipynb` notebook and run the cells."
157+
]
158+
},
159+
{
160+
"cell_type": "markdown",
161+
"metadata": {
162+
"slideshow": {
163+
"slide_type": "slide"
164+
}
165+
},
166+
"source": [
149167
"From the root of your file system you can run the scripts as follow (you might have to change `2018-05-09` to the current date):\n",
150168
"```\n",
151169
"$ python src/data/01_subset-data-GBP.py data/raw/winemag-data-130k-v2.csv \n",
152170
"$ python src/visualization/02_visualize-wines.py data/interim/2018-05-09-winemag_priceGBP.csv \n",
153171
"$ python src/data/03_country-subset.py data/interim/2018-05-09-winemag_priceGBP.csv Chile\n",
154172
"```\n",
155173
"\n",
156-
"😕 What problems did you encounter? \n"
174+
"😕 What problems did you encounter? "
157175
]
158176
},
159177
{

assets/terminal_source.png

58.3 KB
Loading

0 commit comments

Comments
 (0)