diff --git a/docs/yaml_docs/spatial_deconvolution.md b/docs/yaml_docs/spatial_deconvolution.md
index 1e3fef01..61e1ffa2 100644
--- a/docs/yaml_docs/spatial_deconvolution.md
+++ b/docs/yaml_docs/spatial_deconvolution.md
@@ -1,125 +1,206 @@
+
# Spatial Deconvolution YAML
-In this documentation, the parameters of the `deconvolution_spatial` yaml file are explained.
-This file is generated running `panpipes deconvolution config`.
-In general, the user can leave parameters empty to use defaults.
The individual steps run by the pipeline are described in the [spatial deconvolution workflow](../workflows/deconvolute_spatial.md).
+In this documentation, the parameters of the `deconvolution_spatial` configuration yaml file are explained.
+This file is generated running `panpipes deconvolution_spatial config`.
The individual steps run by the pipeline are described in the [spatial deconvolution workflow](../workflows/deconvolute_spatial.md).
+
+When running the deconvolution workflow, panpipes provides a basic `pipeline.yml` file.
+To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
+However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html).
+You can download the different deconvolution pipeline.yml files here:
+- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes deconvolution_spatial config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_deconvolution_spatial/pipeline.yml)
+- `pipeline.yml` file for [Deconvoluting spatial data Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/deconvolution/deconvoluting_spatial_data_with_panpipes.html): [Download here](https://github.com/DendrouLab/panpipes-tutorials/blob/main/docs/deconvolution/pipeline.yml)
## 0. Compute Resource Options
-| `resources` | |
-| --- | --- |
-| `threads_high` | __`int`__ (default: 1)
Number of threads used for high intensity computing tasks. |
-| `threads_medium` | __`int`__ (default: 1)
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks. |
-| `threads_low` | __`int`__ (default: 1)
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.|
+resources
+Computing resources to use, specifically the number of threads used for parallel jobs.
+Specified by the following three parameters:
+ - threads_high `Integer`, Default: 1
+ Number of threads used for high intensity computing tasks.
+
+ - threads_medium `Integer`, Default: 1
+ Number of threads used for medium intensity computing tasks.
+ For each thread, there must be enough memory to load your mudata and do computationally light tasks.
+
+ - threads_low `Integer`, Default: 1
+ Number of threads used for low intensity computing tasks.
+ For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
+
+condaenv `String`
+ Path to conda environment that should be used to run panpipes.
+ Leave blank if running native or your cluster automatically inherits the login node environment
-| | |
-| ---- | --- |
-| `condaenv` | __`str`__ (default: None)
Path to conda environment that should be used to run `Panpipes`. Leave blank if running native or your cluster automatically inherits the login node environment. |
## 1. Input Options
With the `deconvolution_spatial` workflow, one or multiple spatial slides can be deconvoluted in one run. For that, a `MuData` object for each slide is expected, with the spatial data saved in `mdata.mod["spatial"]`. The spatial slides are deconvoluted **using the same reference**. For the reference, one `MuData` with the gene expression data saved in `mdata.mod["rna"]` is expected as input. Please note, that the same parameter setting is used for each slide.
For the **spatial** input, the workflow, therefore, reads in **all `.h5mu` objects of a directory** (see below). **The spatial and single-cell data thus need to be saved in different folders.**
+
+
+input
+ - spatial `String`, Mandatory parameter
+ Path to folder containing one or multiple `MuDatas` of spatial data. The pipeline is reading in all `MuData` files in that folder and assuming that they are `MuDatas` of spatial slides.
+
+ - singlecell `String`, Mandatory parameter
+ Path to the MuData **file** (not folder) of the reference single-cell data.
-| `input` | |
-| ---- | --- |
-| `spatial` | __`str`__ (not optional)
Path to folder containing one or multiple `MuDatas` of spatial data. The pipeline is reading in all `MuData` files in that folder and assuming that they are `MuDatas` of spatial slides.|
-| `singlecell` | __`str`__ (not optional)
Path to the MuData **file** (not folder) of the reference single-cell data.|
## 2. Cell2Location Options
For each deconvolution method you can specify whether to run it or not:
-| | |
-| ---- | --- |
-| `run` | __`bool`__ (default: None)
Whether to run Cell2location|
+
+
+run `Boolean`, Default: None
+ Whether to run Cell2location
-### Feature Selection
+
+### 2.1 Feature Selection
You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection [according to Cell2Location](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html).
Please note, that gene selection is **not optional**. If no csv-file is provided, feature selection [according to Cell2Location.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html) is performed.
+
+
+feature_selection
+ - gene_list `String`, Default: None
+ Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
+
+ - remove_mt `Boolean`, Default: True
+ Whether to remove mitochondrial genes from the dataset. This step is performed **before** running gene selection.
+
+ - cell_count_cutoff `Integer`, Default: 15
+ All genes detected in less than cell_count_cutoff cells will be excluded. Parameter of the [Cell2Location's gene selection function.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html)
+
+ - cell_percentage_cutoff2 `Float`, Default: 0.05
+ All genes detected in at least this percentage of cells will be included. Parameter of the [Cell2Location's gene selection function.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html)
+
+ - nonz_mean_cutoff `Float`, Default: 1.12
+ Genes detected in the number of cells between the above-mentioned cutoffs are selected only when their average expression in non-zero cells is above this cutoff. Parameter of the [Cell2Location's gene selection function.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html)
+
+
+### 2.2 Reference Model
+
+reference
+ - labels_key `String`, Default: None
+ Key in `.obs` for label (cell type) information.
+
+ - batch_key `String`, Default: None
+ Key in `.obs` for batch information.
+
+ - layer `String`, Default: None
+ Layer in `.layers` to use for the reference model. If None, `.X` will be used. Please note, that Cell2Location expects raw counts as input.
+ - categorical_covariate_key `String`, Default: None
+ Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).
-| `feature_selection` | |
-| ---- | --- |
-| `gene_list` | __`str`__ (default: None)
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.|
-| `remove_mt` | __`bool`__ (default: True)
Whether to remove mitochondrial genes from the dataset. This step is performed **before** running gene selection. |
-| `cell_count_cutoff` | __`int`__ (default: 15)
All genes detected in less than cell_count_cutoff cells will be excluded. Parameter of the [Cell2Location's gene selection function.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html)|
-| `cell_percentage_cutoff2` | __`float`__ (default: 0.05)
All genes detected in at least this percentage of cells will be included. Parameter of the [Cell2Location's gene selection function.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html)|
-| `nonz_mean_cutoff` | __`float`__ (default: 1.12)
Genes detected in the number of cells between the above-mentioned cutoffs are selected only when their average expression in non-zero cells is above this cutoff. Parameter of the [Cell2Location's gene selection function.](https://cell2location.readthedocs.io/en/latest/cell2location.utils.filtering.html) |
+ - continuous_covariate_keys `String`, Default: None
+ Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)
+ - max_epochs `Integer`, Default: _np.min([round((20000 / n_cells) * 400), 400])_
+ Number of epochs.
-### Reference Model
+ - use_gpu `Boolean`, Default: True
+ Whether to use GPU for training.
+
-| `reference` | |
-| ---- | --- |
-| `labels_key` | __`str`__ (default: None)
Key in `.obs` for label (cell type) information. |
-| `batch_key` | __`str`__ (default: None)
Key in `.obs` for batch information. |
-| `layer` | __`float`__ (default: None)
Layer in `.layers` to use for the reference model. If None, `.X` will be used. Please note, that Cell2Location expects raw counts as input.|
-| `categorical_covariate_keys` | __`str`__ (default: None)
Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).|
-| `continuous_covariate_keys` | __`str`__ (default: None)
Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)|
-| `max_epochs` | __`int`__ (default: _np.min([round((20000 / n_cells) * 400), 400])_)
Number of epochs.|
-| `use_gpu` | __`bool`__ (default: True)
Whether to use GPU for training. |
+### 2.3 Spatial Model
-### Spatial Model
+spatial
+ - batch_key `String`, Default: None
+ Key in `.obs` for batch information.
+ - layer `String`, Default: None
+ Layer in `.layers` to use for the reference model. If None, `.X` will be used. Please note, that Cell2Location expects raw counts as input.
-| `spatial` | |
-| ---- | --- |
-| `batch_key` | __`str`__ (default: None)
Key in `.obs` for batch information. |
-| `layer` | __`float`__ (default: None)
Layer in `.layers` to use for the reference model. If None, `.X` will be used. Please note, that Cell2Location expects raw counts as input.|
-| `categorical_covariate_keys` | __`str`__ (default: None)
Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).|
-| `continuous_covariate_keys` | __`str`__ (default: None)
Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)|
-| `N_cells_per_location` | __`int`__ (not optional)
Expected cell abundance per voxel. Please refer to the [Cell2Location documentation](https://cell2location.readthedocs.io/en/latest/index.html) for more information. |
-| `detection_alpha` | __`float`__ (not optional)
Regularization of with-in experiment variation in RNA detection sensitivity. Please refer to the [Cell2Location documentation](https://cell2location.readthedocs.io/en/latest/index.html) for more information. |
-| `max_epochs` | __`int`__ (default: _np.min([round((20000 / n_cells) * 400), 400])_)
Number of epochs.|
-| `use_gpu` | __`bool`__ (default: True)
Whether to use GPU for training. |
+ - categorical_covariate_key `String`, Default: None
+ Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to categorical data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space).
+
+ - continuous_covariate_keys `String`, Default: None
+ Comma-separated without spaces, e.g. _key1,key2,key3_. Keys in `.obs` that correspond to continuous data. These covariates can be added in addition to the batch covariate and are also treated as nuisance factors (i.e., the model tries to minimize their effects on the latent space)
+
+ - N_cells_per_location `Integer`, Mandatory parameter
+ Expected cell abundance per voxel. Please refer to the [Cell2Location documentation](https://cell2location.readthedocs.io/en/latest/index.html) for more information.
+
+ - detection_alpha `Float`, Mandatory parameter
+ Regularization of with-in experiment variation in RNA detection sensitivity. Please refer to the [Cell2Location documentation](https://cell2location.readthedocs.io/en/latest/index.html) for more information.
+
+ - max_epochs `Integer`, Default: _np.min([round((20000 / n_cells) * 400), 400])_
+ Number of epochs.
+
+ - use_gpu `Boolean`, Default: True
+ Whether to use GPU for training.
-###
-You can specify whether both models should be saved with the following parameter:
-| | |
-| ---- | --- |
-| `save_models` | __`bool`__ (default: False)
Whether to save the reference & spatial mapping models|
+You can specify whether both models (spatial and reference) should be saved with the following parameter:
+
+
+save_models, Default: False
+ Whether to save the reference & spatial mapping models.
## 3. Tangram Options
For each deconvolution method you can specify whether to run it or not:
-| | |
-| ---- | --- |
-| `run` | __`bool`__ (default: None)
Whether to run Tangram|
+
+run `Boolean`, Default: None
+ Whether to run Tangram
-### Feature Selection
-You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection via [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html) **on the reference scRNA-Seq data**, as [suggested by Tangram](https://tangram-sc.readthedocs.io/en/latest/tutorial_sq_link.html#Pre-processing). The top `n_genes` of each group make up the reduced gene set.
Please note, that gene selection is **not optional**. If no csv-file is provided, feature selection via [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html) is performed.
+### 3.1 Feature Selection
+You can select genes that are used for deconvolution in two ways. The first option is to provide a reduced feature set as a csv-file that is then used for deconvolution. The second option is to perform gene selection via [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html) **on the reference scRNA-Seq data**, as [suggested by Tangram](https://tangram-sc.readthedocs.io/en/latest/tutorial_sq_link.html#Pre-processing). The top `n_genes` of each group make up the reduced gene set.
Please note, that gene selection is **not optional**. If no csv-file is provided, feature selection via [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html) is performed.
+
-| `feature_selection` | |
-| ---- | --- |
-| `gene_list` | __`str`__ (default: None)
Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.|
+feature_selection
+ - gene_list `String`, Default: None
+ Path to a csv file containing a reduced feature set. A header in the csv is expected in the first row. All genes of that gene list need to be present in both, spatial slides and scRNA-Seq reference.
___Parameters for `scanpy.tl.rank_genes_groups` gene selection___
-| `rank_genes` | |
-| ---- | --- |
-| `labels_key` | __`str`__ (default: None)
Which column in `.obs` of the reference to use for the `groupby` parameter of [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html) .|
-| `layer` | __`str`__ (default: None)
Which layer of the reference to use for [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html). If None, `.X` is used.|
-| `n_genes` | __`int`__ (default: 100)
How many top genes to select of each `groupby` group|
-| `test_method` | __`str ['logreg', 't-test', 'wilcoxon', 't-test_overestim_var']`__ (default: 't-test_overestim_var')
Which test method to use.|
-| `correction_method` | __`str ['benjamini-hochberg', 'bonferroni']`__ (default: ' benjamini-hochberg')
Which p-value correction method to use. Used only for 't-test', 't-test_overestim_var', and 'wilcoxon'. |
-
-### Model
-
-| `model` | |
-| ---- | --- |
-| `labels_key` | __`str`__ (default: None)
Key in `.obs` for label (cell type) information. |
-| `num_epochs` | __`int`__ (default: 1000)
Number of epochs. |
-| `device` | __`str`__ (default: 'cpu')
Which device to use. |
-| `kwargs` | In `kwargs`, the user has the possibility to specify parameters for [tangram.mapping_utils.map_cells_to_space](https://tangram-sc.readthedocs.io/en/latest/classes/tangram.mapping_utils.map_cells_to_space.html?highlight=mapping_utils%20map_cells_to_space#tangram.mapping_utils.map_cells_to_space). You can add or remove any parameters of the function.|
+ - rank_genes
+ - labels_key `String`, Default: None
+ Which column in `.obs` of the reference to use for the `groupby` parameter of [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html).
+
+ - layer `String`, Default: None
+ Which layer of the reference to use for [scanpy.tl.rank_genes_groups](https://scanpy.readthedocs.io/en/stable/generated/scanpy.tl.rank_genes_groups.html). If None, `.X` is used.
+
+ - n_genes `Integer`, Default: 100
+ How many top genes to select of each `groupby` group.
+
+ - test_method `['logreg', 't-test', 'wilcoxon', 't-test_overestim_var']`, Default: 't-test_overestim_var'
+ Which test method to use.
+
+ - correction_method `['benjamini-hochberg', 'bonferroni']`, Default: ' benjamini-hochberg'
+ Which p-value correction method to use. Used only for 't-test', 't-test_overestim_var', and 'wilcoxon'.
+
+
+### 3.2 Model
+
+model
+ - labels_key `String`, Default: None
+ Key in `.obs` for label (cell type) information.
+
+ - num_epochs `Integer`, Default: 1000
+ Number of epochs.
+
+ - device `String`, Default: 'cpu'
+ Which device to use.
+
+ - kwargs
+ In `kwargs`, the user has the possibility to specify parameters for [tangram.mapping_utils.map_cells_to_space](https://tangram-sc.readthedocs.io/en/latest/classes/tangram.mapping_utils.map_cells_to_space.html?highlight=mapping_utils%20map_cells_to_space#tangram.mapping_utils.map_cells_to_space). You can add or remove any parameters of the function.
+
diff --git a/docs/yaml_docs/spatial_preprocess.md b/docs/yaml_docs/spatial_preprocess.md
index b443d4e2..1e2dac66 100644
--- a/docs/yaml_docs/spatial_preprocess.md
+++ b/docs/yaml_docs/spatial_preprocess.md
@@ -1,41 +1,69 @@
+
# Spatial Preprocessing YAML
-In this documentation, the parameters of the `preprocess_spatial` yaml file are explained.
-This file is generated running `panpipes preprocess_spatial config`. In general, the user can leave parameters empty to use defaults.
The individual steps run by the pipeline are described in the [spatial preprocess workflow](../workflows/preprocess_spatial.md).
+In this documentation, the parameters of the `preprocess_spatial` configuration yaml file are explained.
+This file is generated running `panpipes preprocess_spatial config`.
The individual steps run by the pipeline are described in the [spatial preprocessing workflow](../workflows/preprocess_spatial.md).
+When running the preprocess workflow, panpipes provides a basic `pipeline.yml` file.
+To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
+However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html).
+You can download the different preprocess pipeline.yml files here:
+- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes preprocess_spatial config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_preprocess_spatial/pipeline.yml)
+- `pipeline.yml` file for [Preprocessing spatial data Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/preprocess_spatial_data/preprocess_spatial_data_with_panpipes.html): [Download here](https://github.com/DendrouLab/panpipes-tutorials/blob/main/docs/preprocess_spatial_data/pipeline.yml)
## 0. Compute Resource Options
-| `resources` | |
-| --- | --- |
-| `threads_high` | __`int`__ (default: 1)
Number of threads used for high intensity computing tasks. |
-| `threads_medium` | __`int`__ (default: 1)
Number of threads used for medium intensity computing tasks. For each thread, there must be enough memory to load your mudata and do computationally light tasks. |
-| `threads_low` | __`int`__ (default: 1)
Number of threads used for low intensity computing tasks. For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.|
+resources
+Computing resources to use, specifically the number of threads used for parallel jobs.
+Specified by the following three parameters:
+ - threads_high `Integer`, Default: 1
+ Number of threads used for high intensity computing tasks.
-| | |
-| ---- | --- |
-| `condaenv` | __`str`__ (default: None)
Path to conda environment that should be used to run `Panpipes`. Leave blank if running native or your cluster automatically inherits the login node environment. |
+ - threads_medium `Integer`, Default: 1
+ Number of threads used for medium intensity computing tasks.
+ For each thread, there must be enough memory to load your mudata and do computationally light tasks.
+
+ - threads_low `Integer`, Default: 1
+ Number of threads used for low intensity computing tasks.
+ For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
+
+condaenv `String`
+ Path to conda environment that should be used to run panpipes.
+ Leave blank if running native or your cluster automatically inherits the login node environment
## 1. Input Options
With the preprocess_spatial workflow, one or multiple `MuData` objects can be preprocessed in one run. The workflow **reads in all `.h5mu` objects of a directory**. The `MuData` objects in the directory need to be of the same assay (vizgen or visium). The workflow then runs the preprocessing of each `MuData` object separately with the same parameters that are specified in the yaml file.
+
+
+input_dir `String`, Mandatory parameter
+ Path to the folder containing all input `h5mu` files.
+
+assay [`'visium'`, `'vizgen'`], Default: `'visium'`
+ Spatial transcriptomics assay of the `h5mu` files in `input_dir`.
-| | |
-| ---- | --- |
-| `input_dir` | __`str`__ (not optional)
Path to the folder containing all input `h5mu` files. |
-| `assay` | __`str` [`'visium'`, `'vizgen'`]__ (default: 'visium')
Spatial transcriptomics assay of the `h5mu` files in `input_dir`.|
## 2. Filtering Options
+filtering
+ - run `Boolean`, Default: False
+ Whether to run filtering. **If `False`, will not filter the data and will not produce post-filtering plots.**
-| `filtering` | |
-| --- | --- |
-| `run` | __`bool`__ (default: False)
Whether to run filtering. **If `False`, will not filter the data and will not produce post-filtering plots.** |
-| `keep_barcodes` | __`str`__ (default: None)
Path to a csv-file that has **no header** containing barcodes you want to keep. Barcodes that are not in the file, will be removed from the dataset before filtering the dataset with the thresholds specified below. |
+ - keep_barcodes `String`, Default: None
+ Path to a csv-file that has **no header** containing barcodes you want to keep. Barcodes that are not in the file, will be removed from the dataset before filtering the dataset with the thresholds specified below.
+
With the parameters below you can specify thresholds for filtering. The filtering is fully customisable to any columns in `.obs` or `.var`. You are not restricted by the columns given as default. When specifying a column name, please make sure it exactly matches the column name in the h5mu object.
Please slso make sure, that the specified metrics are present in all `h5mu` objects of the `input_dir`, i.e. the `MuData` objects for that the preprocessing is run.
@@ -60,51 +88,65 @@ With the parameters below you can specify thresholds for filtering. The filterin
## 3. Post-Filter Plotting
The parameters below specify which metrics of the filtered data to plot. As for the [QC](./spatial_qc.md), violin and spatial embedding plots are generated for each slide separately.
+
-| `plotqc` | |
-| --- | --- |
-| `grouping_var` | __`str`__ (default: None)
Comma-separated string without spaces, e.g. _sample_id,batch_ of categorical columns in `.obs`. One violin will be created for each group in the violin plot. Not mandatory, can be left empty. |
-| `spatial_metrics` | __`str`__ (default: None)
Comma-separated string without spaces, e.g. _total_counts,n_genes_by_counts_ of columns in `.obs` or `.var`.
Specifies which metrics to plot. If metric is present in both, `.obs` and `.var`, **both will be plotted.** |
+plotqc
+ - grouping_var `String`, Default: None
+ Comma-separated string without spaces, e.g. _sample_id,batch_ of categorical columns in `.obs`. One violin will be created for each group in the violin plot. Not mandatory, can be left empty.
+ - spatial_metrics `String`, Default: None
+ Comma-separated string without spaces, e.g. _total_counts,n_genes_by_counts_ of columns in `.obs` or `.var`.
Specifies which metrics to plot. If metric is present in both, `.obs` and `.var`, **both will be plotted.**
+
## 4. Normalization, HVG Selection, and PCA Options
-### **Normalization and HVG Selection**
+### **4.1 Normalization and HVG Selection**
+`Panpipes` offers two different normalization and HVG selection flavours, `'seurat'` and `'squidpy'`.
The `'seurat'` flavour first selects HVGs on the raw counts using analytic Pearson residuals, i.e. [scanpy.experimental.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.experimental.pp.highly_variable_genes.html). Afterwards, analytic Pearson residual normalization is applied, i.e. [scanpy.experimental.pp.normalize_pearson_residuals](https://scanpy.readthedocs.io/en/stable/generated/scanpy.experimental.pp.normalize_pearson_residuals.html). Parameters of both functions can be specified by the user in the yaml file.
The `'squidpy'` flavour runs the basic scanpy normalization and HVG selection functions, i.e. [scanpy.pp.normalize_total](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html), [scanpy.pp.log1p](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.log1p.html), and [scanpy.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html).
+
+
+norm_hvg_flavour[`'squidpy'`, `'seurat'`], Default: None
+ Normalization and HVG selection flavour to use. If None, will not run normalization nor HVG selection.
+
+
+___Parameters for `norm_hvg_flavour` == `'squidpy'`___
+
+squidpy_hvg_flavour[`'seurat'`,`'cellranger'`,`'seurat_v3'`], Default: 'seurat'
+ Flavour to select HVGs, i.e.`flavor` parameter of the function [scanpy.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html).
+
+min_mean`Float`, Default: 0.05
+ Parameter in [scanpy.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html).
+
+max_mean`Float`, Default: 1.5
+ Parameter in [scanpy.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html).
+
+min_disp`Float`, Default: 0.5
+ Parameter in [scanpy.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html).
+
+___Parameters for `norm_hvg_flavour` == `'seurat'`___
-`Panpipes` offers two different normalization and HVG selection flavours, `'seurat'` and `'squidpy'`.
The `'seurat'` flavour first selects HVGs on the raw counts using analytic Pearson residuals, i.e. [scanpy.experimental.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.experimental.pp.highly_variable_genes.html). Afterwards, analytic Pearson residual normalization is applied, i.e. [scanpy.experimental.pp.normalize_pearson_residuals](https://scanpy.readthedocs.io/en/stable/generated/scanpy.experimental.pp.normalize_pearson_residuals.html). Parameters of both functions can be specified by the user in the yaml file.
The `'squidpy'` flavour runs the basic scanpy normalization and HVG selection functions, i.e. [scanpy.pp.normalize_total](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.normalize_total.html), [scanpy.pp.log1p](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.log1p.html), and [scanpy.pp.highly_variable_genes](https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html).
+theta`Float`, Default: 100
+ The negative binomial overdispersion parameter for pearson residuals. The same value is used for [HVG selection]((https://scanpy.readthedocs.io/en/stable/generated/scanpy.experimental.pp.highly_variable_genes.html)) and [normalization](https://scanpy.readthedocs.io/en/stable/generated/scanpy.experimental.pp.normalize_pearson_residuals.html).
+clip`Float`, Default: None
+ Specifies clipping of the residuals.
`clip` can be specified as: