nf-osi
diff --git a/‎man/make_case_list_maf.Rd‎
Lines changed: 12 additions & 0 deletions b/‎man/make_case_list_maf.Rd‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎vignettes/bringing-portal-data-to-other-platforms-cbioportal.Rmd‎
Lines changed: 19 additions & 15 deletions b/‎vignettes/bringing-portal-data-to-other-platforms-cbioportal.Rmd‎
Lines changed: 19 additions & 15 deletions
@@ -14,12 +14,17 @@ knitr::opts_chunk$set(
 )
 ```
 
-**Document Status:** Draft  
+**Document Status:** Working  
 **Estimated Reading Time:** 8 min
 
 ## Special acknowledgments 
 
-Functionality demonstrated in this vignette benefited greatly from code originally written by [hhunterzinck](https://github.com/hhunterzinck). 
+Utils demonstrated in this vignette benefited greatly from code originally written by [hhunterzinck](https://github.com/hhunterzinck). 
+
+## Important note
+
+The requirements for cBioPortal change, just like with any software or database. 
+The package is updated to keep up on a yearly submission basis, but there may be occasional points in time when the workflow is out-of-date with this external system. 
 
 ## Intro
 
@@ -47,7 +52,10 @@ syn_login()
 ## Create a new study dataset
 
 First create the study dataset "package" where we can put together the data. 
-Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc.
+Each study dataset combines multiple data types -- clinical, gene expression, gene variants, etc. 
+Meta can be edited after the file has been created. 
+This will also set the working directory to the new study directory.
+
 
 ```{r cbp_new_study, eval=FALSE}
 
@@ -64,15 +72,15 @@ These functions download data files and create the meta for them.
 
 Note that:
 
-- These should be run with the working directory set to the study dataset directory as set up above to ensure consistent metadata.
+- These should be run with the working directory set to the study directory as set up above to ensure consistent metadata.
 - **Defaults are for known NF-OSI processed data outputs**. 
 - If these defaults don't apply because of changes in the scenario, take a look at the lower-level utils `make_meta_*` or edit the files manually after.
 - Data types can vary in how much additional work is needed in remapping, reformatting, custom sanity checks, etc.
 
 ### Add mutations data
 
-- `maf_data` references a final merged maf output file from the NF-OSI processing pipeline OK for public release. 
-- This data file type requires no further modifications except renaming.
+- `maf_data` references a final merged maf output file from the NF-OSI processing pipeline (vcf2maf) OK for public release. 
+- Under the hood, a required case list file is also generated.
 
 ```{r add_maf, eval=FALSE}
 
@@ -109,10 +117,8 @@ cbp_add_expression(mrna_data,
 
 ### Add clinical data
 
-- Clinical data **should be added last**, after all other data has been added, for sample checks to work properly. 
 - `clinical_data` is prepared from an existing Synapse table. The table can be a subsetted version of those released in the study dataset, or pass in a query that can be used for getting the subset. For example, the full clinical cohort comprises patients 1-50, but the dataset can only release data for patients 1-20 for expression data and data patients 15-20 for cna data. Here, `clinical_data` can be a smaller table of just those 1-30, or it can be the original table but pass in a suitable additional filter, e.g. `where release = 'batch1'`.
 - Clinical data requires mapping to be as consistent with other public datasets as possible. `ref_map` defines the mapping of clinical variables from the NF-OSI data dictionary to cBioPortal's. Only variables in the mapping are exported to cBioPortal. Follow link below to inspect the default file and format used.
-- Clinical data **should be added last**, after all other data has been added, for sample checks to work properly.
 
 ```{r add_clinical, eval=FALSE}
 
@@ -124,15 +130,13 @@ cbp_add_clinical(clinical_data, ref_map)
 
 ## Validation
 
-There are additional steps such as generating case lists and validation that have to be done _outside_ of the package with a cBioPortal backend, where each portal may have specific configurations (such as genomic reference) to validate against.
-See the [general docs for dataset validation](https://docs.cbioportal.org/using-the-dataset-validator/).
+Validation has to be done with a cBioPortal instance. Each portal may have specific configurations (such as genomic reference) to validate against.
 
-For the _public_ portal, the suggested step using the public server is given below.  
-
-Assuming your present working directory is `~/datahub/public` and a study folder called `npst_nfosi_ntap_2022` has been placed into it, mount the dataset into the container and run validation like:  
+For an example simple *offline* validation, assuming you are at `~/datahub/public` and a study folder called `npst_nfosi_ntap_2022` has been placed into it, mount the dataset into the container and run validation like:  
 ```
 STUDY=npst_nfosi_ntap_2022
-sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:5.4.7 validateStudies.py -d /datahub -l $STUDY -u http://cbioportal.org -html /datahub/$STUDY/html_report
+sudo docker run --rm -v $(pwd):/datahub cbioportal/cbioportal:6.0.25 validateData.py -s datahub/$STUDY -n -v
 ```
 
-The html report will list issues by data types to help with any corrections needed.  
+**See the [general docs for dataset validation](https://docs.cbioportal.org/using-the-dataset-validator) for more examples.**
+