|
6 | 6 | "metadata": {},
|
7 | 7 | "source": [
|
8 | 8 | "# GWAS in the cloud\n",
|
| 9 | + "## Overview\n", |
9 | 10 | "We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
|
10 |
| - "Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R." |
| 11 | + "\n", |
| 12 | + "Most of this notebook is written in Bash, but expects that you are using a Python kernel, until step 3, plotting where you will need to switch your kernel to R." |
| 13 | + ] |
| 14 | + }, |
| 15 | + { |
| 16 | + "cell_type": "markdown", |
| 17 | + "id": "3edafe63", |
| 18 | + "metadata": {}, |
| 19 | + "source": [ |
| 20 | + "## Learning Objectives\n", |
| 21 | + "The goal is to learn how to execute a GWAS analysis in a cloud environment" |
| 22 | + ] |
| 23 | + }, |
| 24 | + { |
| 25 | + "cell_type": "markdown", |
| 26 | + "id": "5d7ef396", |
| 27 | + "metadata": {}, |
| 28 | + "source": [ |
| 29 | + "## Prerequisites\n", |
| 30 | + "+ You only need access to a Sagemaker notebook environment to run this notebook" |
| 31 | + ] |
| 32 | + }, |
| 33 | + { |
| 34 | + "cell_type": "markdown", |
| 35 | + "id": "39ee9668", |
| 36 | + "metadata": {}, |
| 37 | + "source": [ |
| 38 | + "## Get Started" |
11 | 39 | ]
|
12 | 40 | },
|
13 | 41 | {
|
14 | 42 | "cell_type": "markdown",
|
15 | 43 | "id": "8fbf6304",
|
16 | 44 | "metadata": {},
|
17 | 45 | "source": [
|
18 |
| - "## 1. Setup\n", |
19 |
| - "### Download the data\n", |
| 46 | + "### Install packages and set up environment\n", |
| 47 | + "\n", |
| 48 | + "#### Download the data\n", |
20 | 49 | "use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
|
21 | 50 | ]
|
22 | 51 | },
|
|
68 | 97 | "tags": []
|
69 | 98 | },
|
70 | 99 | "source": [
|
71 |
| - "## 1. Install dependencies" |
| 100 | + "### Install dependencies" |
| 101 | + ] |
| 102 | + }, |
| 103 | + { |
| 104 | + "cell_type": "code", |
| 105 | + "execution_count": null, |
| 106 | + "id": "9f5032d7", |
| 107 | + "metadata": {}, |
| 108 | + "outputs": [], |
| 109 | + "source": [ |
| 110 | + "# install mamba\n", |
| 111 | + "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n", |
| 112 | + "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge" |
| 113 | + ] |
| 114 | + }, |
| 115 | + { |
| 116 | + "cell_type": "code", |
| 117 | + "execution_count": null, |
| 118 | + "id": "1a5bd340", |
| 119 | + "metadata": {}, |
| 120 | + "outputs": [], |
| 121 | + "source": [ |
| 122 | + "# add to your path\n", |
| 123 | + "import os\n", |
| 124 | + "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\"" |
72 | 125 | ]
|
73 | 126 | },
|
74 | 127 | {
|
|
78 | 131 | "metadata": {},
|
79 | 132 | "outputs": [],
|
80 | 133 | "source": [
|
| 134 | + "# install everything else\n", |
81 | 135 | "! mamba install -y -c bioconda plink vcftools"
|
82 | 136 | ]
|
83 | 137 | },
|
|
86 | 140 | "id": "3de2fc4c",
|
87 | 141 | "metadata": {},
|
88 | 142 | "source": [
|
89 |
| - "## 2. Analyze" |
| 143 | + "## Analyze" |
90 | 144 | ]
|
91 | 145 | },
|
92 | 146 | {
|
|
266 | 320 | "id": "1f52e97c",
|
267 | 321 | "metadata": {},
|
268 | 322 | "source": [
|
269 |
| - "## 3. Plotting\n", |
| 323 | + "## Plotting\n", |
270 | 324 | "In this tutorial, plotting is done in R, so at this point you can change your kernel to R in the top right. Wait for it to say 'idle' in the bottom left, then continue. You could also plot using Python native packages and maintain the Python notebook kernel."
|
271 | 325 | ]
|
272 | 326 | },
|
|
359 | 413 | "\n",
|
360 | 414 | "The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide."
|
361 | 415 | ]
|
| 416 | + }, |
| 417 | + { |
| 418 | + "cell_type": "markdown", |
| 419 | + "id": "2f6e1ef6", |
| 420 | + "metadata": {}, |
| 421 | + "source": [ |
| 422 | + "### Conclusion\n", |
| 423 | + "Here we learned how to run a simple GWAS analysis in the cloud" |
| 424 | + ] |
| 425 | + }, |
| 426 | + { |
| 427 | + "cell_type": "markdown", |
| 428 | + "id": "044a04d8", |
| 429 | + "metadata": {}, |
| 430 | + "source": [ |
| 431 | + "## Clean up\n", |
| 432 | + "Make sure you shut down this VM, or delete it if you don't plan to use if further.\n", |
| 433 | + "\n", |
| 434 | + "You can also [delete the buckets](https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-bucket.html) if you don't want to pay for the data: `aws s3 rb s3://bucket-name --force`" |
| 435 | + ] |
| 436 | + }, |
| 437 | + { |
| 438 | + "cell_type": "markdown", |
| 439 | + "id": "c1e7be16", |
| 440 | + "metadata": {}, |
| 441 | + "source": [] |
362 | 442 | }
|
363 | 443 | ],
|
364 | 444 | "metadata": {
|
|
0 commit comments