You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/bringing-portal-data-to-other-platforms-cbioportal.Rmd
+19-15Lines changed: 19 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,17 @@ knitr::opts_chunk$set(
14
14
)
15
15
```
16
16
17
-
**Document Status:**Draft
17
+
**Document Status:**Working
18
18
**Estimated Reading Time:** 8 min
19
19
20
20
## Special acknowledgments
21
21
22
-
Functionality demonstrated in this vignette benefited greatly from code originally written by [hhunterzinck](https://github.com/hhunterzinck).
22
+
Utils demonstrated in this vignette benefited greatly from code originally written by [hhunterzinck](https://github.com/hhunterzinck).
23
+
24
+
## Important note
25
+
26
+
The requirements for cBioPortal change, just like with any software or database.
27
+
The package is updated to keep up on a yearly submission basis, but there may be occasional points in time when the workflow is out-of-date with this external system.
23
28
24
29
## Intro
25
30
@@ -47,7 +52,10 @@ syn_login()
47
52
## Create a new study dataset
48
53
49
54
First create the study dataset "package" where we can put together the data.
50
-
Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc.
55
+
Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc.
56
+
Meta can be edited after the file has been created.
57
+
This will also set the working directory to the new study directory.
58
+
51
59
52
60
```{r cbp_new_study, eval=FALSE}
53
61
@@ -64,15 +72,15 @@ These functions download data files and create the meta for them.
64
72
65
73
Note that:
66
74
67
-
- These should be run with the working directory set to the study dataset directory as set up above to ensure consistent metadata.
75
+
- These should be run with the working directory set to the study directory as set up above to ensure consistent metadata.
68
76
-**Defaults are for known NF-OSI processed data outputs**.
69
77
- If these defaults don't apply because of changes in the scenario, take a look at the lower-level utils `make_meta_*` or edit the files manually after.
70
78
- Data types can vary in how much additional work is needed in remapping, reformatting, custom sanity checks, etc.
71
79
72
80
### Add mutations data
73
81
74
-
-`maf_data` references a final merged maf output file from the NF-OSI processing pipeline OK for public release.
75
-
-This data file type requires no further modifications except renaming.
82
+
-`maf_data` references a final merged maf output file from the NF-OSI processing pipeline (vcf2maf) OK for public release.
83
+
-Under the hood, a required case list file is also generated.
- Clinical data **should be added last**, after all other data has been added, for sample checks to work properly.
113
120
-`clinical_data` is prepared from an existing Synapse table. The table can be a subsetted version of those released in the study dataset, or pass in a query that can be used for getting the subset. For example, the full clinical cohort comprises patients 1-50, but the dataset can only release data for patients 1-20 for expression data and data patients 15-20 for cna data. Here, `clinical_data` can be a smaller table of just those 1-30, or it can be the original table but pass in a suitable additional filter, e.g. `where release = 'batch1'`.
114
121
- Clinical data requires mapping to be as consistent with other public datasets as possible. `ref_map` defines the mapping of clinical variables from the NF-OSI data dictionary to cBioPortal's. Only variables in the mapping are exported to cBioPortal. Follow link below to inspect the default file and format used.
115
-
- Clinical data **should be added last**, after all other data has been added, for sample checks to work properly.
There are additional steps such as generating case lists and validation that have to be done _outside_ of the package with a cBioPortal backend, where each portal may have specific configurations (such as genomic reference) to validate against.
128
-
See the [general docs for dataset validation](https://docs.cbioportal.org/using-the-dataset-validator/).
133
+
Validation has to be done with a cBioPortal instance. Each portal may have specific configurations (such as genomic reference) to validate against.
129
134
130
-
For the _public_ portal, the suggested step using the public server is given below.
131
-
132
-
Assuming your present working directory is `~/datahub/public` and a study folder called `npst_nfosi_ntap_2022` has been placed into it, mount the dataset into the container and run validation like:
135
+
For an example simple *offline* validation, assuming you are at `~/datahub/public` and a study folder called `npst_nfosi_ntap_2022` has been placed into it, mount the dataset into the container and run validation like:
0 commit comments