Skip to content

Commit 747f1ac

Browse files
authored
Merge pull request #66 from STRIDES/reformat_notebooks_kao
Reformat notebooks to align to common standard
2 parents ea6f571 + ef498ee commit 747f1ac

30 files changed

+763
-8050
lines changed

notebooks/ElasticBLAST/run_elastic_blast.ipynb

+47-16
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,41 @@
1010
},
1111
{
1212
"cell_type": "markdown",
13-
"id": "aee3b229",
1413
"metadata": {},
1514
"source": [
16-
"This notebook is based on the [this tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html). Make sure you select a kernel with Python 3.7 for the Elastic BLAST install. One good option is `conda_mxnet_latest_p37`. "
15+
"## Overview\n",
16+
"This notebook helps you to run Blast in a scalable manner using AWS Batch. The script will spin up and later tear down your cluster to execute the Blast jobs. This notebook is based on the [this tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html). Make sure you select a kernel with Python 3.7 for the Elastic BLAST install. One good option is `conda_mxnet_latest_p37`. "
1717
]
1818
},
1919
{
2020
"cell_type": "markdown",
21-
"id": "38dfb579",
2221
"metadata": {},
2322
"source": [
24-
"### 1) Install elastic blast"
23+
"## Prerequisites\n",
24+
"You need to make sure you have permissions use to use Cloud Formation, Batch, and SageMaker"
25+
]
26+
},
27+
{
28+
"cell_type": "markdown",
29+
"metadata": {},
30+
"source": [
31+
"## Learning Objectives\n",
32+
"+ Learn to use Batch to scale compute jobs.\n",
33+
"+ Learn how to use BLAST in the cloud."
34+
]
35+
},
36+
{
37+
"cell_type": "markdown",
38+
"metadata": {},
39+
"source": [
40+
"## Get Started"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"metadata": {},
46+
"source": [
47+
"### Install packages"
2548
]
2649
},
2750
{
@@ -31,7 +54,7 @@
3154
"metadata": {},
3255
"outputs": [],
3356
"source": [
34-
"!pip3 install elastic-blast"
57+
"! pip3 install elastic-blast"
3558
]
3659
},
3760
{
@@ -49,16 +72,16 @@
4972
"metadata": {},
5073
"outputs": [],
5174
"source": [
52-
"!elastic-blast --version\n",
53-
"!elastic-blast --help"
75+
"! elastic-blast --version\n",
76+
"! elastic-blast --help"
5477
]
5578
},
5679
{
5780
"cell_type": "markdown",
5881
"id": "58b59cb0",
5982
"metadata": {},
6083
"source": [
61-
"### 2) Optionally, create a bucket for this tutorial if one does not yet exist"
84+
"### Create a bucket for this tutorial if one does not yet exist, make sure to pick a unique name"
6285
]
6386
},
6487
{
@@ -68,15 +91,15 @@
6891
"metadata": {},
6992
"outputs": [],
7093
"source": [
71-
"!aws s3 mb s3://elasticblast-sagemaker"
94+
"! aws s3 mb s3://elasticblast-sagemaker"
7295
]
7396
},
7497
{
7598
"cell_type": "markdown",
7699
"id": "449d7511",
77100
"metadata": {},
78101
"source": [
79-
"### 3) Create a config file that defines the job parameters"
102+
"### Create a config file that defines the job parameters"
80103
]
81104
},
82105
{
@@ -86,7 +109,7 @@
86109
"metadata": {},
87110
"outputs": [],
88111
"source": [
89-
"!touch BDQA.ini"
112+
"! touch BDQA.ini"
90113
]
91114
},
92115
{
@@ -122,7 +145,7 @@
122145
"id": "9a9f8192",
123146
"metadata": {},
124147
"source": [
125-
"### 4) Submit the job"
148+
"### Submit the job"
126149
]
127150
},
128151
{
@@ -132,15 +155,15 @@
132155
"metadata": {},
133156
"outputs": [],
134157
"source": [
135-
"!elastic-blast submit --cfg BDQA.ini"
158+
"! elastic-blast submit --cfg BDQA.ini"
136159
]
137160
},
138161
{
139162
"cell_type": "markdown",
140163
"id": "9a8e7716",
141164
"metadata": {},
142165
"source": [
143-
"### 5) Check results and troubleshoot"
166+
"### Check results and troubleshoot"
144167
]
145168
},
146169
{
@@ -153,12 +176,20 @@
153176
"+ Finally, to view your outputs, look at the files in your S3 output bucket, something like `aws s3 ls s3://elasticblast-sagemaker/results/BDQA/`."
154177
]
155178
},
179+
{
180+
"cell_type": "markdown",
181+
"metadata": {},
182+
"source": [
183+
"## Conclusions\n",
184+
"Here we submited a parallel Blast job to an AWS Batch cluster using Cloud Formation to handle provisioning and tear down of resources. "
185+
]
186+
},
156187
{
157188
"cell_type": "markdown",
158189
"id": "292947f1-5247-4da5-81bd-7fc8fc420ca4",
159190
"metadata": {},
160191
"source": [
161-
"### 6) Clean up cloud resources"
192+
"## Clean Up"
162193
]
163194
},
164195
{
@@ -168,7 +199,7 @@
168199
"metadata": {},
169200
"outputs": [],
170201
"source": [
171-
"!elastic-blast delete --cfg BDQA.ini"
202+
"! elastic-blast delete --cfg BDQA.ini"
172203
]
173204
}
174205
],

notebooks/GWAS/GWAS_coat_color.ipynb

+86-6
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,46 @@
66
"metadata": {},
77
"source": [
88
"# GWAS in the cloud\n",
9+
"## Overview\n",
910
"We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
10-
"Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R."
11+
"\n",
12+
"Most of this notebook is written in Bash, but expects that you are using a Python kernel, until step 3, plotting where you will need to switch your kernel to R."
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "3edafe63",
18+
"metadata": {},
19+
"source": [
20+
"## Learning Objectives\n",
21+
"The goal is to learn how to execute a GWAS analysis in a cloud environment"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"id": "5d7ef396",
27+
"metadata": {},
28+
"source": [
29+
"## Prerequisites\n",
30+
"+ You only need access to a Sagemaker notebook environment to run this notebook"
31+
]
32+
},
33+
{
34+
"cell_type": "markdown",
35+
"id": "39ee9668",
36+
"metadata": {},
37+
"source": [
38+
"## Get Started"
1139
]
1240
},
1341
{
1442
"cell_type": "markdown",
1543
"id": "8fbf6304",
1644
"metadata": {},
1745
"source": [
18-
"## 1. Setup\n",
19-
"### Download the data\n",
46+
"### Install packages and set up environment\n",
47+
"\n",
48+
"#### Download the data\n",
2049
"use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
2150
]
2251
},
@@ -68,7 +97,31 @@
6897
"tags": []
6998
},
7099
"source": [
71-
"## 1. Install dependencies"
100+
"### Install dependencies"
101+
]
102+
},
103+
{
104+
"cell_type": "code",
105+
"execution_count": null,
106+
"id": "9f5032d7",
107+
"metadata": {},
108+
"outputs": [],
109+
"source": [
110+
"# install mamba\n",
111+
"! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
112+
"! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
113+
]
114+
},
115+
{
116+
"cell_type": "code",
117+
"execution_count": null,
118+
"id": "1a5bd340",
119+
"metadata": {},
120+
"outputs": [],
121+
"source": [
122+
"# add to your path\n",
123+
"import os\n",
124+
"os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
72125
]
73126
},
74127
{
@@ -78,6 +131,7 @@
78131
"metadata": {},
79132
"outputs": [],
80133
"source": [
134+
"# install everything else\n",
81135
"! mamba install -y -c bioconda plink vcftools"
82136
]
83137
},
@@ -86,7 +140,7 @@
86140
"id": "3de2fc4c",
87141
"metadata": {},
88142
"source": [
89-
"## 2. Analyze"
143+
"## Analyze"
90144
]
91145
},
92146
{
@@ -266,7 +320,7 @@
266320
"id": "1f52e97c",
267321
"metadata": {},
268322
"source": [
269-
"## 3. Plotting\n",
323+
"## Plotting\n",
270324
"In this tutorial, plotting is done in R, so at this point you can change your kernel to R in the top right. Wait for it to say 'idle' in the bottom left, then continue. You could also plot using Python native packages and maintain the Python notebook kernel."
271325
]
272326
},
@@ -359,6 +413,32 @@
359413
"\n",
360414
"The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide."
361415
]
416+
},
417+
{
418+
"cell_type": "markdown",
419+
"id": "2f6e1ef6",
420+
"metadata": {},
421+
"source": [
422+
"### Conclusion\n",
423+
"Here we learned how to run a simple GWAS analysis in the cloud"
424+
]
425+
},
426+
{
427+
"cell_type": "markdown",
428+
"id": "044a04d8",
429+
"metadata": {},
430+
"source": [
431+
"## Clean up\n",
432+
"Make sure you shut down this VM, or delete it if you don't plan to use if further.\n",
433+
"\n",
434+
"You can also [delete the buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html) if you don't want to pay for the data: `aws s3 rb s3://bucket-name --force`"
435+
]
436+
},
437+
{
438+
"cell_type": "markdown",
439+
"id": "c1e7be16",
440+
"metadata": {},
441+
"source": []
362442
}
363443
],
364444
"metadata": {

0 commit comments

Comments
 (0)