You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: app/content/anvil-cmg/guides.mdx
+10-47Lines changed: 10 additions & 47 deletions
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ We are excited to introduce users of the [AnVIL Portal]({portalURL}) to the [AnV
11
11
12
12
## What is the AnVIL Data Explorer?
13
13
14
-
Until now, the way to browse datasets through the AnVIL portal has been to use the [AnVIL Dataset Catalog]({portalURL}/data/consortia), which allows you to organize your search results based on the workspaces that contain a subset of a dataset by its consent code. The AnVIL Dataset Catalog provides summary level information of the dataset and study.
14
+
Until now, the way to browse datasets through the AnVIL Portal has been to use the [AnVIL Dataset Catalog]({portalURL}/data/consortia), which allows you to organize your search results based on the workspaces that contain a subset of a dataset by its consent code. The AnVIL Dataset Catalog provides summary level information of the dataset and study.
15
15
16
16
With the addition of the [AnVIL Data Explorer]({browserURL}), you’ll be able to sort the datasets you’re browsing based on the following managed access categories:
17
17
@@ -31,7 +31,7 @@ When working with NIH data, particularly managed access data, users must follow
31
31
32
32
## How do I use the AnVIL Data Explorer?
33
33
34
-
Below you’ll find step-by-step instructions for navigating the AnVIL Data Explorer and exporting data to Terra. In order to successfully import data from the AnVIL Data Explorer to Terra, you’ll need to make your selection in the AnVIL Data Explorer, export this selection to a Terra workspace, and (as an interim stopgap) run a final step in the form of a Jupyter Notebook in order to retrieve all of the data relevant to your selection.
34
+
Below you’ll find step-by-step instructions for navigating the AnVIL Data Explorer and exporting data to Terra.
35
35
36
36
### Step 1: Finding the AnVIL Data Explorer
37
37
@@ -61,37 +61,20 @@ The facets are visible in the screen above in the column on the left. If you cli
61
61
62
62
When you select multiple facets, only data matching all selected facets is displayed (e.g. filter by Anatomical Site AND BioSample Type). When you select multiple values within a facet, data matching any of the facet values is displayed (e.g. selecting both Blood and Tissue above will list studies that include Blood OR Tissue samples).
63
63
64
-
#### Exploring Studies
64
+
#### Exploring Datasets
65
65
66
-
Note: Currently the summary pages of studies in the AnVIL Data Explorer are not populated with information, yet the information described below will be populated for all of the studies as development progresses.
66
+
When you click on a dataset, you’ll be taken a summary page where you can find a variety of information and helpful links, including but not limited to:
67
67
68
-
When you click on a study, you’ll be taken a summary page where you can find a variety of information and helpful links, including but not limited to:
69
-
70
-
- What consortium the data is associated with
68
+
- What consortium the dataset is associated with
71
69
- The quantity and types of data
72
70
- Links to APIs for accessing the data programmatically
73
71
- Links to request access
74
-
- A button for exporting the data to a Terra workspace
72
+
- A button for exporting the dataset to a Terra workspace
The format of exported study data is currently being updated. Exporting study
84
-
data is temporarily disabled by default while this work is in progress. Export
85
-
will be reenabled when the format is stable and working well.
86
-
87
-
</Alert>
88
-
89
-
To use data you’ve found through the AnVIL Data Explorer within your Terra workspace, you’ll need to follow two steps:
90
-
91
-
First, you export the data from the AnVIL Data Explorer to Terra.
92
-
93
-
Second, currently, you will need to use a publicly available notebook to fill in some of the metadata that doesn’t populate automatically.
94
-
95
78
#### Exporting from The AnVIL Data Explorer
96
79
97
80
Once you’re ready to export the data, you can click Export to Terra from within a particular study, or you can also click the Export button at the top right of your screen when you are on the AnVIL Data Explorer’s main page.
@@ -113,15 +96,13 @@ After you click this button, you will be prompted to wait while the system gener
113
96
src="/guides/export-to-terra2.png"
114
97
/>
115
98
116
-
####Working with the Data in Terra
99
+
### Working with the Data in Terra
117
100
118
101
Until recently, AnVIL data has been hosted and shared from multiple Terra workspaces making it hard to generate cohorts across differing studies. To resolve this, we created the AnVIL Data Explorer enabling you to create custom cohorts and then hand them off to your own Terra workspaces.
119
102
120
103
Depending on the AnVIL dataset/study, the data in question have varying schemas (different columns and structure to the data). In an effort to ingest all of the AnVIL datasets, the Broad's Data Sciences Platform created a common subset schema across all AnVIL datasets. When you use the AnVIL Data Explorer, it actually searches through a specialized subset - called the Findability Subset (FSS) - that only contains the attributes which are most commonly used by researchers across a broad range of study data types and for diverse analyses.
121
104
122
-
At this point in development, the data that you hand off from the AnVIL Data Explorer to your Terra workspace is incomplete - it only contains the columns that were deemed relevant for the FSS used in the AnVIL Data Explorer. Currently, if you want the complete data, you will need to perform one additional step after exporting to your workspace. This step is performed by running a publicly available Jupyter Notebook, as instructed below.
123
-
124
-
##### Working with NIH Data in Terra
105
+
#### Working with NIH Data in Terra
125
106
126
107
When working with NIH data in Terra, we require users to import data to workspaces with the checkbox for protected data marked. Optionally, an Authorization Domain may be applied and is highly recommended if working with controlled access data.
127
108
@@ -132,7 +113,7 @@ In the data handoff from the AnVIL Data Explorer, you will transition to the Ter
132
113
src="/guides/working-with-data.png"
133
114
/>
134
115
135
-
#####Selecting Workspace
116
+
#### Selecting Workspace
136
117
137
118
Next, you’ll see a workspace selection screen where you can either choose an existing workspace or create a new workspace to receive the data.
138
119
@@ -150,21 +131,3 @@ If you choose to “start with a new workspace”, you will see that the import
150
131
<Figurealt="Create New Workspace"src="/guides/create-new-workspace.png" />
151
132
152
133
Once you’ve completed this step, your workspace will spin up and you can go to the Data tab of your workspace to see a set of tables have been successfully imported into the workspace.
153
-
154
-
##### Using the Notebook
155
-
156
-
The data you now see in your workspace is incomplete - to retrieve the complete data, you’ll need to run a Jupyter Notebook created for this purpose called \*get_non_findability_subset_data_v7.ipynb\*. We’ve published this notebook in a [public workspace](https://app.terra.bio/#workspaces/anvil-datastorage/AnVIL_Explorer_FSS_Tool) for your convenience. To complete the import process, all you need to do is copy the notebook from the public workspace into the workspace to which you’ve imported your data, and run the notebook from within that workspace.
157
-
158
-
##### Copy the Notebook
159
-
160
-
Go to the public workspace containing the AnVIL FSS tool notebook, navigate to the Analyses section, and use the three-dot menu to the right of the notebook name to find the option to copy the notebook to another workspace:
161
-
162
-
<Figurealt="Copy the Notebook"src="/guides/copy-notebook.png" />
163
-
164
-
##### Run the Notebook
165
-
166
-
Once the notebook is in the same workspace as the data, set up a Clou open the notebook in edit mode (the default environment is sufficient), and select “Run All” from the Cell menu:
167
-
168
-
<Figurealt="Run the Notebook"src="/guides/run-notebook.png" />
169
-
170
-
Once you’ve completed this step, you should be able to see a new set of tables in the Data tab of your workspace.
0 commit comments