diff --git a/docs/yaml_docs/index.rst b/docs/yaml_docs/index.rst
index 94ab5126..e2c3adf4 100644
--- a/docs/yaml_docs/index.rst
+++ b/docs/yaml_docs/index.rst
@@ -10,4 +10,5 @@ Workflows configuration files
pipeline_integration_yml
spatial_qc
spatial_preprocess
- spatial_deconvolution
\ No newline at end of file
+ spatial_deconvolution
+ pipeline_refmap_yml.md
diff --git a/docs/yaml_docs/pipeline_refmap_yml.md b/docs/yaml_docs/pipeline_refmap_yml.md
new file mode 100644
index 00000000..7ac364c6
--- /dev/null
+++ b/docs/yaml_docs/pipeline_refmap_yml.md
@@ -0,0 +1,138 @@
+
+
+# Refmap workflow
+In this documentation, the parameters of the `refmap` configuration yaml file are explained.
+This file is generated running `panpipes refmap config`.
The individual steps run by the pipeline are described in the [Reference Mapping workflow](https://github.com/DendrouLab/panpipes/blob/main/docs/workflows/refmap.md).
+
+
+When running the refmap workflow, panpipes provides a basic `pipeline.yml` file.
+To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
+However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-tutorials.readthedocs.io/en/latest/refmap_pancreas/Reference_mapping.html)
+
+For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md)
+
+You can download the different refmap `pipeline.yml` files here:
+- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes refmap config: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_refmap/pipeline.yml)
+- `pipeline.yml` file for [Reference Mapping Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/refmap_pancreas/Reference_mapping.html): [Download here](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/cfb2a3d64a5e7b2cabe7ee8e1ac5fe61/pipeline.yml)
+
+
+## Compute resources options
+
+resources
+Computing resources to use, specifically the number of threads used for parallel jobs.
+Specified by the following three parameters:
+ - threads_high `Integer`, Default: 1
+Number of threads used for high intensity computing tasks.
+For each thread, there must be enough memory to load all your input files at once and create the MuData object.
+
+ - threads_medium `Integer`, Default: 1
+Number of threads used for medium intensity computing tasks.
+For each thread, there must be enough memory to load your mudata and do computationally light tasks.
+
+ - threads_low `Integer`, Default: 1
+Number of threads used for low intensity computing tasks.
+For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
+
+ - condaenv `String` (Path)
+Path to conda environment that should be used to run panpipes.
+
+ - queues: `String` (Path)
+In case a special queue is required for long jobs or if the user has access to a GPU-specific queue. Otherwise, leave it blank.
+ - long: `String` (Path)
+ - gpu: `String` (Path)
+
+
+## Loading data options
+### Query Dataset
+
+- query `String`, Default: path/to/data
+ Give the path to the desired data. Formats accepted include raw10x, preprocessed quality filtered mudata or anndata as input query
+- modality `String`, Default: rna
+If mudata was provided then specify the modality to be used. Currently, only RNA modality is supported.
+- query_batch `String`, Default:
+Only to be filled if the data provided had a batch correction, if so specify the column this is in. If not, leave blank
+- query_celltype `String`, Default:
+If the query provided has celltype annotations that should be compared to the transferred labels. If not, leave blank.
+
+## Scvi tools parameters
+
+- reference_data `String`, Default: path/to/mudata
+Specify one or more reference models to be used as reference. Users can also specify their own reference built using `pipeline_integration`.
+Leave blank for no model specification.
+
+- totalvi: `String`, Default: path/to/totalvi
+Provide path to totalvi saved model. Multiple paths can be provided as a list:
+```yaml
+totalvi:
+ - path_to_totalvi1
+ - path_to_totalvi2
+
+```
+ -
+
+- impute_proteins `Boolean`, Default: False
+- transform_batch `String`, Default:
+Transform_batch is a batch-covariate specific to totalvi, allows the model to use the batch information in the query to mitigate
+differences in protein sequencing depth.
+- scvi `String`, Default: path/to/scvi Mandatory, Provide a path to the scvi model. Multiple paths can be provided as a list:
+
+```yaml
+scvi:
+ - path_to_totalvi1
+ - path_to_totalvi2
+
+```
+
+- scanvi `String`, Default:path/to/scanvi Mandatory, Provide a path to the scvi model.
+- run_randomforest `Boolean`, Default:False
+Set to true if the reference model has a trained random forest classifier to transfer the labels.
+
+## Training parameters
+To reuse the same params in multiple locations, please use anchors (&) and scalars (*) in the relevant place, i.e. if specifying &rna_neighbors, the same params will be called by *rna_neighbors where referenced. Check our documentation for more info on using anchors and scalars
+
+- training_plan:
+ - totalvi: Default: array of training parameters.
For the full list of parameters check [here](https://docs.scvi-tools.org/en/0.14.1/api/reference/scvi.model.TOTALVI.train.html). to reuse the same parameters in other locations use an anchor, for example writing `totalvi: &totalvitraining` and will ensure the same array is reused when referencing it as `*totalvitraining`. In this example the `&totalvitraining` array contains the two parameters `max_epochs` and `weight_decay`
+ - max_epochs `Integer`, Default: 200
+ - weight_decay `Float`, Default: 0.0
+ Recommended weight decay is 0.0. This ensures the latent representation of the reference cells will remain exactly the same if passing them through this new query model.
+ - scvi Array of training parameters, Default: `*totalvitraining` (reuse the same array as specified above)
+ - scanvi Array of training parameters, Default: `*totalvitraining` (reuse the same array as specified above)
+
+## Neighbors parameters to calculate umaps
+This can be on either query alone, or query+ reference dataset.
+
+- neighbors:
+ - npcs `Integer`, Default: 30
+Number of Principal Components to calculate for neighbours and umap. If no correction is applied, PCA will be calculated and used to run UMAP and clustering on.
+And if Harmony is the method of choice, it will use these components to create a corrected dim red.
+ - k `Integer`, Default: 30
+This is the number of neighbours
+ - metric `String`, Default: euclidean
+Options here include cosine and euclidean
+ - method `String`, Default: sanpy
+Options here include scanpy, and hnsw (from scvelo)
+
+## Run scib metrics on query
+Running scib on query data after transferring labels, where available (with the totalvi and scanvi models), or using default leiden clustering after training the vae model (scvi)
+Check [documentation](https://scib.readthedocs.io/en/latest/) for the metrics used
+- scib:
+ - run `Boolean`, Default: False
+ - cluster_key `String`, Default: predictions
+Used for ARI and NMI, if left empty will default to leiden clustering calculated on the new latent representation after reference mapping.
+ - batch_key `String`, Default:
+ Used for clisi_graph_embed and if no batch is present the metrics will not be included in the results. If left blank will default do cluster_key defauls.
+ - celltype_key `String`, Default: celltype
+
+
+
+
+
+
diff --git a/panpipes/panpipes/pipeline_refmap/pipeline.yml b/panpipes/panpipes/pipeline_refmap/pipeline.yml
index d82a4206..da5e784e 100644
--- a/panpipes/panpipes/pipeline_refmap/pipeline.yml
+++ b/panpipes/panpipes/pipeline_refmap/pipeline.yml
@@ -53,7 +53,7 @@ reference_data: path_to_mudata
totalvi:
- path_to_totalvi1
- path_to_totalvi2
-impute_proteins: True
+impute_proteins: False
# transform_batch is a batch-covariate specific to totalvi, allows the model to use the batch information in the query to mitigate
# differences in protein sequencing depth
transform_batch: