Skip to content

Commit

Permalink
Merge branch 'main' of github.com:vadimnazarov/panpipes
Browse files Browse the repository at this point in the history
  • Loading branch information
vadimnazarov committed Feb 4, 2025
2 parents b76bfa9 + 304a9f8 commit 410c897
Show file tree
Hide file tree
Showing 16 changed files with 307 additions and 9 deletions.
1 change: 1 addition & 0 deletions docs/yaml_docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ Workflows configuration files
spatial_deconvolution
pipeline_visualization_yml
pipeline_refmap_yml
threads_tasks_panpipes


2 changes: 1 addition & 1 deletion docs/yaml_docs/pipeline_clustering_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ You can download the different clustering pipeline.yml files here:
## Compute resources options

- <span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs, Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 2<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/pipeline_ingestion_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ You can download the different ingestion `pipeline.yml` files here:
## Compute resources options

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs. Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/pipeline_integration_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ For more information on functionalities implemented in `panpipes` to read the co
## Compute resources options

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs. Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/pipeline_preprocess_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ You can download the different preprocess `pipeline.yml` files here:
## Compute resources options

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs.Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 2<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/pipeline_refmap_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ You can download the different refmap `pipeline.yml` files here:
## Compute resources options

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs. Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/pipeline_visualization_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ You can download the different ingestion `pipeline.yml` files here:

## Compute resources options
<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs. Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/spatial_deconvolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ For more information on functionalities implemented in `panpipes` to read the co
## 0. Compute Resource Options

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs. Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
2 changes: 1 addition & 1 deletion docs/yaml_docs/spatial_preprocess.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ You can download the different preprocess pipeline.yml files here:
## 0. Compute Resource Options

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Computing resources to use, specifically the number of threads used for parallel jobs. Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
1 change: 1 addition & 0 deletions docs/yaml_docs/spatial_qc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ For more information on functionalities implemented in `panpipes` to read the co

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Check [threads_tasks_panpipes](./threads_tasks_panpipes.md) for more information on which threads each specific task requires.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
Expand Down
271 changes: 271 additions & 0 deletions docs/yaml_docs/threads_tasks_panpipes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
# Threads for individual workflow tasks

<table>
<tr>
<th colspan="3">Task ingest</th>
</tr>
<tr>
<th>threads_high</th>
<th>threads_medium</th>
<th>threads_low</th>
</tr>
<tr>
<td>Creating h5mu from filtered data files</td>
<td>load_mudatas</td>
<td>run_repertoire_qc</td>
</tr>
<tr>
<td>Creating h5mu from bg data files</td>
<td>load_bg_mudatas</td>
<td>run_atac_qc</td>
</tr>
<tr>
<td>rna QC</td>
<td>downsample_bg_mudatas</td>
<td>plot_qc</td>
</tr>
<tr>
<td>prot QC</td>
<td>run_scrublet</td>
<td>10X metrics plotting</td>
</tr>
<tr>
<td>prot QC</td>
<td></td>
<td></td>
</tr>
<tr>
<th colspan="3">Task preprocess</th>
</tr>
<tr>
<th>threads_high</th>
<th>threads_medium</th>
<th>threads_low</th>
<th></th>
<th></th>
</tr>
<tr>
<td>assess background</td>
<td></td>
<td>filter_mudata</td>
</tr>
<tr>
<td>rna_preprocess</td>
<td></td>
<td>downsample</td>
</tr>
<tr>
<td>prot_preprocess</td>
<td></td>
<td>postfilterplot</td>
</tr>
<tr>
<td>atac_preprocess</td>
<td></td>
<td></td>
</tr>
<tr>
<th colspan="3">Task integration</th>
</tr>
<tr>
<th>threads_high</th>
<th>threads_medium</th>
<th>threads_low</th>
</tr>
<tr>
<td>run_no_batch_correct_rna</td>
<td>Evaluation</td>
<td>run_lisi</td>
</tr>
<tr>
<td>run_bbknn_rna</td>
<td>plot_umaps</td>
<td></td>
</tr>
<tr>
<td>run_harmony_rna</td>
<td>run_scib_metrics</td>
<td></td>
</tr>
<tr>
<td>run_combat_rna</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_scanorama_rna</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_scvi_rna</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_no_batch_correct_prot</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_harmony_prot</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_bbknn_prot</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_combat_prot</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_no_batch_correct_atac</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_harmony_atac</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_bbknn_atac</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_totalvi</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_multivi</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_mofa</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_wnn</td>
<td></td>
<td></td>
</tr>
<tr>
<td>merge_integration</td>
<td></td>
<td></td>
</tr>
<tr>
<th colspan="3">Task clustering</th>
</tr>
<tr>
<th>threads_high</th>
<th>threads_medium</th>
<th>threads_low</th>
</tr>
<tr>
<td>run_neighbors</td>
<td>run_clustering</td>
<td>plot_clustree</td>
</tr>
<tr>
<td>run_umap</td>
<td>collate_mdata</td>
<td>aggregate_clusters</td>
</tr>
<tr>
<td>find_markers</td>
<td>plot_cluster_umaps</td>
<td></td>
</tr>
<tr>
<td></td>
<td>plot_markers</td>
<td></td>
</tr>
<tr>
<th colspan="3">Task vis</th>
</tr>
<tr>
<th>threads_high</th>
<th></th>
<th>threads_low</th>
</tr>
<tr>
<td>plot_custom_markers_per_group</td>
<td></td>
<td>plot_metrics</td>
</tr>
<tr>
<td>plot_custom_markers_umap</td>
<td></td>
<td></td>
</tr>
<tr>
<td>plot_categorical_umaps</td>
<td></td>
<td></td>
</tr>
<tr>
<td>write_obs</td>
<td></td>
<td></td>
</tr>
<tr>
<td>plot_scatters</td>
<td></td>
<td></td>
</tr>
<tr>
<th colspan = "3"> Task refmap </th>
</tr>
<tr>
<th>threads_high</th>
<th><th>
<td></td>
<td></td>
</tr>
<tr>
<td>run_refmap_scvi</td>
<td></td>
<td></td>
</tr>
<tr>
<td>run_scib_refmap</td>
<td></td>
<td></td>
</tr>
<tr>
<th colspan="3">Task preprocess spatial</th>
</tr>
<tr>
<th>threads_high</th>
<th></th>
<th>threads_low</th>
</tr>
<tr>
<td>spatial_preprocess</td>
<td></td>
<td>filter_mudata</td>
</tr>
<tr>
<th colspan="3">Task Spatial</th>
</tr>
<tr>
<th>threads_high</th>
<th></th>
<th>threads_low</th>
</tr>
<tr>
<td>load_mudata</td>
<td></td>
<td>plotQC_spatial</td>
</tr>
</table>
6 changes: 6 additions & 0 deletions panpipes/panpipes/pipeline_ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,12 @@ def unfilt_file():
def gen_load_filtered_anndata_jobs():
caf = pd.read_csv(PARAMS["submission_file"], sep="\t")

duplicated_rows = caf.duplicated()

if duplicated_rows.any():
print(f"Duplicated rows found and removed: {duplicated_rows.sum()} rows.")
caf = caf.drop_duplicates()

return gen_load_anndata_jobs(
caf,
load_raw=False,
Expand Down
2 changes: 2 additions & 0 deletions panpipes/panpipes/pipeline_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -817,6 +817,7 @@ def plot_umaps(infile, outfile):

#this can follow now any mtd generation, but it will collate only RNA jobs for lisi
@follows(collate_integration_outputs)
@active_if(PARAMS['lisi_run'])
@transform(collate_integration_outputs,
formatter(), 'logs/7_lisi.log')
def run_lisi(infile, outfile):
Expand All @@ -834,6 +835,7 @@ def run_lisi(infile, outfile):


@follows(collate_integration_outputs)
@active_if(PARAMS['scib_run'])
@transform(collate_integration_outputs, formatter(), 'logs/scib.log')
def run_scib_metrics(infile, outfile):
cell_mtd_file = sprefix + "_cell_mtd.csv"
Expand Down
Loading

0 comments on commit 410c897

Please sign in to comment.