Skip to content

Commit c76b847

Browse files
committed
second version
1 parent b566120 commit c76b847

18 files changed

+9638
-593
lines changed

README.md

+25-6
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,35 @@ This repository is an adjunct to the "Ten Simple Rules for Reproducible Research
55
The example notebooks demonstrate some of rules.
66

77
## Example 1
8-
This example demonstrates a 4-step workflow for predicting the protein fold type using a Machine Learning approach.
8+
This example demonstrates a reproducible 4-step workflow for predicting a protein fold classification using a Machine Learning approach.
99

10-
You can launch the top level notebook directly in your web browser: [0-Workflow.ipynb](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?filepath=example1%2F0-Workflow.ipynb).
10+
---
1111

12-
Then follow the steps in the notebook to run the 4 steps of the workflow.
12+
**Rule 8: Prepare Your Notebooks to Be Read, Run, and Explored.** The nbviewer links provide to a non-interactive preview of notebooks and ![Binder](https://mybinder.org/badge.svg) buttons launch
13+
notebooks in your web browser using the Binder ([mybinder.org](https://mybinder.org/)) server (may be slow!). All notebooks can also be launched directly from the links in the 0-Workflow.ipynb top-level notebook.
14+
15+
---
16+
17+
| Nbviewer | Jupyter Notebook | Jupyter Lab | PDF |
18+
| --- | -- | --- | --- |
19+
| [0-Workflow.ipynb](https://nbviewer.jupyter.org/github/jupyter-guide/ten-rules-jupyter/blob/master/example1/0-Workflow.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?filepath=example1%2F0-Workflow.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?urlpath=lab/tree/example1%2F0-Workflow.ipynb) | pdf |
20+
| [1-CreateDataset.ipynb](https://nbviewer.jupyter.org/github/jupyter-guide/ten-rules-jupyter/blob/master/example1/1-CreateDataset.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?filepath=example1%2F1-CreateDataset.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?urlpath=lab/tree/example1%2F1-CreateDataset.ipynb) | pdf |
21+
| [2-CalculateFeatures.ipynb](https://nbviewer.jupyter.org/github/jupyter-guide/ten-rules-jupyter/blob/master/example1/2-CalculateFeatures) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?filepath=example1%2F2-CalculateFeatures.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?urlpath=lab/tree/example1%2F2-CalculateFeatures.ipynb) | pdf |
22+
| [3-FitModel.ipynb](https://nbviewer.jupyter.org/github/jupyter-guide/ten-rules-jupyter/blob/master/example1/3-FitModel) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?filepath=example1%2F3-FitModel.ipynb) |[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?urlpath=lab/tree/example1%2F3-FitModel.ipynb) | pdf |
23+
| [4-Predict.ipynb](https://nbviewer.jupyter.org/github/jupyter-guide/ten-rules-jupyter/blob/master/example1/4-Predict.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?filepath=example1%2F4-Predict.ipynb) | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/jupyter-guide/ten-rules-jupyter/master?urlpath=lab/tree/example1%2F4-Predict.ipynb)| pdf |
24+
25+
---
26+
27+
**Rule 7: Share Your Data and Explain How to Use It.** To enable reproducibility, we provide a example1/data directory with all data required to run the workflow. A description of the data with download location and download date is [available](./example1/data/Datasets.md).
28+
29+
---
1330

1431
## Example 2
1532

33+
Example 2 goes here ...
34+
1635

17-
## How do I run a Juypter Notebook from this site?
18-
The Jupyter notebook links on this page launch the notebooks in your web browser without software installation using Binder ([mybinder.org](https://mybinder.org/)), an experimental platform for reproducible research (The Binder servers can be slow or may not be available).
36+
## How do I run Notebooks from this Site?
37+
The Launch Binder links on this page launch notebooks in your web browser without software installation using Binder ([mybinder.org](https://mybinder.org/)), an experimental platform for reproducible research (The Binder servers may be slow or not available intermittently).
1938

20-
After you click on a notebook link above, you see a spinning Binder logo. Wait until the notebook launches (this may take a few minutes). Then click the Run ">>" button to execute the cells in the notebook.
39+
After you click on a launch link above, you see a spinning Binder logo. Wait until the notebook launches (this may take a few minutes). Then click the Run ">>" button to execute the cells in the notebook.

example1/0-Workflow.ipynb

+30-14
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"**The notebooks in this directory were developed to demonstrate the \"Ten Rules for Reproducible Research with Jupyter Notebooks\". Throughout the notebooks we mention the rules we applied.**\n",
14+
"**The notebooks in this directory were developed to demonstrate the \"Ten Rules for Reproducible Research with Jupyter Notebooks\". Throughout the notebooks we refer to some the rules we applied.**\n",
1515
"\n",
1616
"**For example, this notebook demonstrates:**\n",
1717
"\n",
1818
"---\n",
1919
"\n",
2020
"**Rule 1: Tell a Story for a Specific Audience.** This notebook was developed for biologists to learn how to apply a simple machine learning model to protein sequences.\n",
2121
"\n",
22-
"**Rule 3: Document the Entire Workflow.** This top-level notebook links to 3 notebooks that represent the steps of a workflow. This modularity makes it easy to replace one of the steps, for example, use a different method to calculate features or apply a different machine learning model.\n",
22+
"**Rule 3: Document the Entire Workflow.** This top-level notebook links to 4 notebooks that represent the steps of a workflow. This modularity makes it easy to replace one of the steps, for example, use a different method to calculate features or apply a different machine learning model.\n",
2323
"\n",
2424
"---"
2525
]
@@ -47,15 +47,15 @@
4747
"We can classify proteins into three major fold types based on their predominant secondary structure content\n",
4848
"* alpha: contains predominantly alpha helices\n",
4949
"* beta: contains predominantly beta sheets\n",
50-
"* alpha+beta: contains both alpha helices and beta sheets"
50+
"* alpha+beta: contains alpha helices and beta sheets"
5151
]
5252
},
5353
{
5454
"cell_type": "markdown",
5555
"metadata": {},
5656
"source": [
5757
"## Goal\n",
58-
"This notebook serves as an example of using machine learning techniques applied to protein sequences. The goal is to create a simple machine learning model to predict the fold type of a protein given its protein sequence. We train the model on a representative set of 3D structure from the Protein Data Bank.\n",
58+
"This notebook demostrates how to create a reproducible record to create a machine learning model. We train a simple model to predict the fold class of a protein given its protein sequence using a representative set of 3D structures from the Protein Data Bank.\n",
5959
"\n",
6060
"Run the following notebooks to work through this example."
6161
]
@@ -71,7 +71,7 @@
7171
"cell_type": "markdown",
7272
"metadata": {},
7373
"source": [
74-
"First, we need to create a dataset with protein secondary structure information obtained from 3D protein chains.\n",
74+
"First, we create a dataset with protein secondary structure information obtained from 3D protein chains.\n",
7575
"\n",
7676
"Run the following notebook to extract secondary structure information from a representative set of protein chains downloaded from the RCSB Protein Data Bank and assign a fold type to each protein chain."
7777
]
@@ -87,7 +87,7 @@
8787
"cell_type": "markdown",
8888
"metadata": {},
8989
"source": [
90-
"The notebook saves the dataset in the file `secondaryStructure.json`."
90+
"The notebook saves the dataset in the file `./intermediate_data/foldClassification.json`."
9191
]
9292
},
9393
{
@@ -117,7 +117,7 @@
117117
"cell_type": "markdown",
118118
"metadata": {},
119119
"source": [
120-
"The notebook saves the dateset in the file `features.json`."
120+
"This notebook saves the dataset with feature vectors in the file `./intermediate_data/features.json`."
121121
]
122122
},
123123
{
@@ -131,7 +131,7 @@
131131
"cell_type": "markdown",
132132
"metadata": {},
133133
"source": [
134-
"Next, we fit a 3-state classification model using the feature vectors as inputs and the known fold types from the Protein Data Bank dataset.\n",
134+
"Next, we fit a 3-state classification model using the feature vectors and the given fold classification from the Protein Data Bank dataset.\n",
135135
"\n",
136136
"Run the following notebook to fit a machine learning model on a training set and evaluate its performance on a test set."
137137
]
@@ -143,6 +143,13 @@
143143
"[3-FitModel.ipynb](./3-FitModel.ipynb)"
144144
]
145145
},
146+
{
147+
"cell_type": "markdown",
148+
"metadata": {},
149+
"source": [
150+
"This notebook saves the classification model in the file `./intermediate_data/classifier`."
151+
]
152+
},
146153
{
147154
"cell_type": "markdown",
148155
"metadata": {},
@@ -154,7 +161,7 @@
154161
"cell_type": "markdown",
155162
"metadata": {},
156163
"source": [
157-
"Finally, we use the Word2Vec model and the trained classifier to predict the fold class from a protein sequence."
164+
"Finally, we use the trained classifier to predict the fold class from a protein sequence."
158165
]
159166
},
160167
{
@@ -184,19 +191,17 @@
184191
},
185192
{
186193
"cell_type": "code",
187-
"execution_count": 2,
194+
"execution_count": 1,
188195
"metadata": {},
189196
"outputs": [
190197
{
191198
"name": "stdout",
192199
"output_type": "stream",
193200
"text": [
194-
"The watermark extension is already loaded. To reload it, use:\n",
195-
" %reload_ext watermark\n",
196201
"CPython 3.6.3\n",
197202
"IPython 6.3.1\n",
198203
"\n",
199-
"gensim 3.6.0\n",
204+
"ipywidgets 7.4.0\n",
200205
"matplotlib 2.2.2\n",
201206
"numpy 1.14.5\n",
202207
"pandas 0.22.0\n",
@@ -214,7 +219,18 @@
214219
],
215220
"source": [
216221
"%load_ext watermark\n",
217-
"%watermark -v -m -p gensim,matplotlib,numpy,pandas,sklearn"
222+
"%watermark -v -m -p ipywidgets,matplotlib,numpy,pandas,sklearn"
223+
]
224+
},
225+
{
226+
"cell_type": "markdown",
227+
"metadata": {},
228+
"source": [
229+
"---\n",
230+
"\n",
231+
"**Authors:** Peter W. Rose, Shih-Cheng Huang, UC San Diego, October 1, 2018\n",
232+
"\n",
233+
"---"
218234
]
219235
}
220236
],

0 commit comments

Comments
 (0)