Skip to content

Commit 3690afc

Browse files
committed
Documentation
1 parent a14a1a3 commit 3690afc

File tree

2 files changed

+31
-15
lines changed

2 files changed

+31
-15
lines changed

man/make_case_list_maf.Rd

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/bringing-portal-data-to-other-platforms-cbioportal.Rmd

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,17 @@ knitr::opts_chunk$set(
1414
)
1515
```
1616

17-
**Document Status:** Draft
17+
**Document Status:** Working
1818
**Estimated Reading Time:** 8 min
1919

2020
## Special acknowledgments
2121

22-
Functionality demonstrated in this vignette benefited greatly from code originally written by [hhunterzinck](https://github.com/hhunterzinck).
22+
Utils demonstrated in this vignette benefited greatly from code originally written by [hhunterzinck](https://github.com/hhunterzinck).
23+
24+
## Important note
25+
26+
The requirements for cBioPortal change, just like with any software or database.
27+
The package is updated to keep up on a yearly submission basis, but there may be occasional points in time when the workflow is out-of-date with this external system.
2328

2429
## Intro
2530

@@ -47,7 +52,10 @@ syn_login()
4752
## Create a new study dataset
4853

4954
First create the study dataset "package" where we can put together the data.
50-
Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc.
55+
Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc.
56+
Meta can be edited after the file has been created.
57+
This will also set the working directory to the new study directory.
58+
5159

5260
```{r cbp_new_study, eval=FALSE}
5361
@@ -64,15 +72,15 @@ These functions download data files and create the meta for them.
6472

6573
Note that:
6674

67-
- These should be run with the working directory set to the study dataset directory as set up above to ensure consistent metadata.
75+
- These should be run with the working directory set to the study directory as set up above to ensure consistent metadata.
6876
- **Defaults are for known NF-OSI processed data outputs**.
6977
- If these defaults don't apply because of changes in the scenario, take a look at the lower-level utils `make_meta_*` or edit the files manually after.
7078
- Data types can vary in how much additional work is needed in remapping, reformatting, custom sanity checks, etc.
7179

7280
### Add mutations data
7381

74-
- `maf_data` references a final merged maf output file from the NF-OSI processing pipeline OK for public release.
75-
- This data file type requires no further modifications except renaming.
82+
- `maf_data` references a final merged maf output file from the NF-OSI processing pipeline (vcf2maf) OK for public release.
83+
- Under the hood, a required case list file is also generated.
7684

7785
```{r add_maf, eval=FALSE}
7886
@@ -109,10 +117,8 @@ cbp_add_expression(mrna_data,
109117

110118
### Add clinical data
111119

112-
- Clinical data **should be added last**, after all other data has been added, for sample checks to work properly.
113120
- `clinical_data` is prepared from an existing Synapse table. The table can be a subsetted version of those released in the study dataset, or pass in a query that can be used for getting the subset. For example, the full clinical cohort comprises patients 1-50, but the dataset can only release data for patients 1-20 for expression data and data patients 15-20 for cna data. Here, `clinical_data` can be a smaller table of just those 1-30, or it can be the original table but pass in a suitable additional filter, e.g. `where release = 'batch1'`.
114121
- Clinical data requires mapping to be as consistent with other public datasets as possible. `ref_map` defines the mapping of clinical variables from the NF-OSI data dictionary to cBioPortal's. Only variables in the mapping are exported to cBioPortal. Follow link below to inspect the default file and format used.
115-
- Clinical data **should be added last**, after all other data has been added, for sample checks to work properly.
116122

117123
```{r add_clinical, eval=FALSE}
118124
@@ -124,15 +130,13 @@ cbp_add_clinical(clinical_data, ref_map)
124130

125131
## Validation
126132

127-
There are additional steps such as generating case lists and validation that have to be done _outside_ of the package with a cBioPortal backend, where each portal may have specific configurations (such as genomic reference) to validate against.
128-
See the [general docs for dataset validation](https://docs.cbioportal.org/using-the-dataset-validator/).
133+
Validation has to be done with a cBioPortal instance. Each portal may have specific configurations (such as genomic reference) to validate against.
129134

130-
For the _public_ portal, the suggested step using the public server is given below.
131-
132-
Assuming your present working directory is `~/datahub/public` and a study folder called `npst_nfosi_ntap_2022` has been placed into it, mount the dataset into the container and run validation like:
135+
For an example simple *offline* validation, assuming you are at `~/datahub/public` and a study folder called `npst_nfosi_ntap_2022` has been placed into it, mount the dataset into the container and run validation like:
133136
```
134137
STUDY=npst_nfosi_ntap_2022
135-
sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:5.4.7 validateStudies.py -d /datahub -l $STUDY -u http://cbioportal.org -html /datahub/$STUDY/html_report
138+
sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:6.0.25 validateData.py -s datahub/$STUDY -n -v
136139
```
137140

138-
The html report will list issues by data types to help with any corrections needed.
141+
**See the [general docs for dataset validation](https://docs.cbioportal.org/using-the-dataset-validator) for more examples.**
142+

0 commit comments

Comments
 (0)