diff --git a/docs/yaml_docs/index.rst b/docs/yaml_docs/index.rst index 94ab5126..e2c3adf4 100644 --- a/docs/yaml_docs/index.rst +++ b/docs/yaml_docs/index.rst @@ -10,4 +10,5 @@ Workflows configuration files pipeline_integration_yml spatial_qc spatial_preprocess - spatial_deconvolution \ No newline at end of file + spatial_deconvolution + pipeline_refmap_yml.md diff --git a/docs/yaml_docs/pipeline_refmap_yml.md b/docs/yaml_docs/pipeline_refmap_yml.md new file mode 100644 index 00000000..7ac364c6 --- /dev/null +++ b/docs/yaml_docs/pipeline_refmap_yml.md @@ -0,0 +1,138 @@ + + +# Refmap workflow +In this documentation, the parameters of the `refmap` configuration yaml file are explained. +This file is generated running `panpipes refmap config`.
The individual steps run by the pipeline are described in the [Reference Mapping workflow](https://github.com/DendrouLab/panpipes/blob/main/docs/workflows/refmap.md). + + +When running the refmap workflow, panpipes provides a basic `pipeline.yml` file. +To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data. +However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-tutorials.readthedocs.io/en/latest/refmap_pancreas/Reference_mapping.html) + +For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md) + +You can download the different refmap `pipeline.yml` files here: +- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes refmap config: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_refmap/pipeline.yml) +- `pipeline.yml` file for [Reference Mapping Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/refmap_pancreas/Reference_mapping.html): [Download here](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/cfb2a3d64a5e7b2cabe7ee8e1ac5fe61/pipeline.yml) + + +## Compute resources options + +resources
+Computing resources to use, specifically the number of threads used for parallel jobs. +Specified by the following three parameters: + - threads_high `Integer`, Default: 1
+Number of threads used for high intensity computing tasks. +For each thread, there must be enough memory to load all your input files at once and create the MuData object. + + - threads_medium `Integer`, Default: 1
+Number of threads used for medium intensity computing tasks. +For each thread, there must be enough memory to load your mudata and do computationally light tasks. + + - threads_low `Integer`, Default: 1
+Number of threads used for low intensity computing tasks. +For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two. + + - condaenv `String` (Path)
+Path to conda environment that should be used to run panpipes. + + - queues: `String` (Path)
+In case a special queue is required for long jobs or if the user has access to a GPU-specific queue. Otherwise, leave it blank. + - long: `String` (Path)
+ - gpu: `String` (Path)
+ + +## Loading data options +### Query Dataset + +- query `String`, Default: path/to/data
+ Give the path to the desired data. Formats accepted include raw10x, preprocessed quality filtered mudata or anndata as input query +- modality `String`, Default: rna
+If mudata was provided then specify the modality to be used. Currently, only RNA modality is supported. +- query_batch `String`, Default:
+Only to be filled if the data provided had a batch correction, if so specify the column this is in. If not, leave blank +- query_celltype `String`, Default:
+If the query provided has celltype annotations that should be compared to the transferred labels. If not, leave blank. + +## Scvi tools parameters + +- reference_data `String`, Default: path/to/mudata
+Specify one or more reference models to be used as reference. Users can also specify their own reference built using `pipeline_integration`. +Leave blank for no model specification. + +- totalvi: `String`, Default: path/to/totalvi
+Provide path to totalvi saved model. Multiple paths can be provided as a list: +```yaml +totalvi: + - path_to_totalvi1 + - path_to_totalvi2 + +``` + - + +- impute_proteins `Boolean`, Default: False
+- transform_batch `String`, Default:
+Transform_batch is a batch-covariate specific to totalvi, allows the model to use the batch information in the query to mitigate +differences in protein sequencing depth. +- scvi `String`, Default: path/to/scvi Mandatory, Provide a path to the scvi model. Multiple paths can be provided as a list:
+ +```yaml +scvi: + - path_to_totalvi1 + - path_to_totalvi2 + +``` + +- scanvi `String`, Default:path/to/scanvi Mandatory, Provide a path to the scvi model.
+- run_randomforest `Boolean`, Default:False
+Set to true if the reference model has a trained random forest classifier to transfer the labels. + +## Training parameters +To reuse the same params in multiple locations, please use anchors (&) and scalars (*) in the relevant place, i.e. if specifying &rna_neighbors, the same params will be called by *rna_neighbors where referenced. Check our documentation for more info on using anchors and scalars + +- training_plan:
+ - totalvi: Default: array of training parameters.
For the full list of parameters check [here](https://docs.scvi-tools.org/en/0.14.1/api/reference/scvi.model.TOTALVI.train.html). to reuse the same parameters in other locations use an anchor, for example writing `totalvi: &totalvitraining` and will ensure the same array is reused when referencing it as `*totalvitraining`. In this example the `&totalvitraining` array contains the two parameters `max_epochs` and `weight_decay` + - max_epochs `Integer`, Default: 200
+ - weight_decay `Float`, Default: 0.0
+ Recommended weight decay is 0.0. This ensures the latent representation of the reference cells will remain exactly the same if passing them through this new query model. + - scvi Array of training parameters, Default: `*totalvitraining` (reuse the same array as specified above)
+ - scanvi Array of training parameters, Default: `*totalvitraining` (reuse the same array as specified above)
+ +## Neighbors parameters to calculate umaps +This can be on either query alone, or query+ reference dataset. + +- neighbors:
+ - npcs `Integer`, Default: 30
+Number of Principal Components to calculate for neighbours and umap. If no correction is applied, PCA will be calculated and used to run UMAP and clustering on. +And if Harmony is the method of choice, it will use these components to create a corrected dim red. + - k `Integer`, Default: 30
+This is the number of neighbours + - metric `String`, Default: euclidean
+Options here include cosine and euclidean + - method `String`, Default: sanpy
+Options here include scanpy, and hnsw (from scvelo) + +## Run scib metrics on query +Running scib on query data after transferring labels, where available (with the totalvi and scanvi models), or using default leiden clustering after training the vae model (scvi) +Check [documentation](https://scib.readthedocs.io/en/latest/) for the metrics used +- scib:
+ - run `Boolean`, Default: False
+ - cluster_key `String`, Default: predictions
+Used for ARI and NMI, if left empty will default to leiden clustering calculated on the new latent representation after reference mapping. + - batch_key `String`, Default:
+ Used for clisi_graph_embed and if no batch is present the metrics will not be included in the results. If left blank will default do cluster_key defauls. + - celltype_key `String`, Default: celltype
+ + + + + + diff --git a/panpipes/panpipes/pipeline_refmap/pipeline.yml b/panpipes/panpipes/pipeline_refmap/pipeline.yml index d82a4206..da5e784e 100644 --- a/panpipes/panpipes/pipeline_refmap/pipeline.yml +++ b/panpipes/panpipes/pipeline_refmap/pipeline.yml @@ -53,7 +53,7 @@ reference_data: path_to_mudata totalvi: - path_to_totalvi1 - path_to_totalvi2 -impute_proteins: True +impute_proteins: False # transform_batch is a batch-covariate specific to totalvi, allows the model to use the batch information in the query to mitigate # differences in protein sequencing depth transform_batch: