Added VDJ concatenation modules and added MuData object#435
Added VDJ concatenation modules and added MuData object#435saraterzo wants to merge 30 commits intonf-core:devfrom
Conversation
|
Hi @saraterzo , One first thought I had is: Maybe would make sense to already add these modules to nf-core instead of adding them locally? With that we would gain that they would already be tested as standalone modules and would know they are working fine for including in the pipeline. Or is there any specific reason for doing as a local module? |
|
Hi @fmalmeida |
|
Hi @saraterzo, thank you for working on this! I'll try to find time this week to give it a proper review. |
grst
left a comment
There was a problem hiding this comment.
Sorry for the delay, finally had the time to look at it.
Found mostly minor things. Since the python parts of the pipeline are increasing, I am also adding ruff linter/formatter in #464. Once that's in, please also apply it to your Python scripts.
| for run, vdj in zip(input_run_id,vdj_files): | ||
| # Read folders with the filtered contigue annotation and store datasets in a dictionary | ||
| print("\n===== READING CONTIGUE ANNOTATION MATRIX =====") | ||
| print("\nProcessing filtered contigue table in folder ... ", end ='') |
There was a problem hiding this comment.
| print("\nProcessing filtered contigue table in folder ... ", end ='') | |
| print("\nProcessing filtered contig table in folder ... ", end ='') |
| if len(adata_vdj_list) == 1: | ||
| adata_vdj_concatenated = adata_vdj_list[0] | ||
| print("Only one non-empty file found. Saving the file as is without concatenation.") | ||
| else: |
There was a problem hiding this comment.
is it necessary to special-case this? i.e. wouldn't ad.concat just work fine with a single file?
| import anndata as ad # store annotated matrix as anndata object | ||
|
|
||
|
|
||
| warnings.filterwarnings("ignore") |
There was a problem hiding this comment.
It would be better to filter only specific (expected) warnings, e.g. by category or message.
| from mudata import MuData | ||
|
|
||
|
|
||
| warnings.filterwarnings("ignore") |
There was a problem hiding this comment.
It would be better to filter only specific (expected) warnings, e.g. by category or message.
| modalities["gex"] = adata[:, adata.var["feature_types"] == "Gene Expression"] | ||
| # Add 'pro' modality if defined | ||
| if adata[:, adata.var["feature_types"] == "Antibody Capture"].shape[1] > 0: | ||
| modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"] |
There was a problem hiding this comment.
| modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"] | |
| modalities["protein"] = adata[:, adata.var["feature_types"] == "Antibody Capture"] |
| def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" } | ||
| if (desired_files.size() > 0) { | ||
| [ meta, desired_files ] | ||
| } |
There was a problem hiding this comment.
| def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" } | |
| if (desired_files.size() > 0) { | |
| [ meta, desired_files ] | |
| } | |
| def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" } | |
| if (desired_files.size() > 0) { | |
| [ meta, desired_files ] | |
| } |
There was a problem hiding this comment.
Couldn't you also use the parse_demultiplexed_output_channels function for this? Would your code still work for VDJ in combination with demultiplexing?
| def meta = [] | ||
| def files = [] | ||
|
|
||
| list.collate(2).each { pair -> | ||
| meta << pair[0] | ||
| files << pair[1] | ||
| } | ||
| return [meta, files.flatten()] |
| } | ||
|
|
||
| ch_vdj_files_collect = ch_vdj_files.collect() | ||
| ch_transformed_channel = ch_vdj_files_collect.map { list -> |
There was a problem hiding this comment.
| ch_transformed_channel = ch_vdj_files_collect.map { list -> | |
| ch_vdj = ch_vdj_files_collect.map { list -> |
| //{assert workflow.trace.tasks().size() == 59}, | ||
|
|
||
| // How many results were produced? | ||
| {assert path("${outputDir}/results_cellrangermulti").list().size() == 4}, | ||
| {assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5}, | ||
| {assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16}, | ||
| {assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4}, | ||
| {assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48}, | ||
| {assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3}, | ||
| //{assert path("${outputDir}/results_cellrangermulti").list().size() == 6}, | ||
| //{assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5}, | ||
| //{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16}, | ||
| //{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4}, | ||
| //{assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48}, | ||
| //{assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3}, |
There was a problem hiding this comment.
Let's not forget to re-enable the testcase before merging
| ch_vdj | ||
| ) | ||
| ch_versions = ch_versions.mix(CONVERT_MUDATA.out.versions) | ||
| } else {'nothing to convert to MuData'} |
There was a problem hiding this comment.
| } else {'nothing to convert to MuData'} | |
| } |
Added VDJ concatenation module to concatenate "filtered_contig_annotation" files from scirpy package.
Added MuData module to create MuData objects to handle VDJ, and CITE-seq modalities. Specifically MuData object is built only for filtered count matrices (not raw) from GEX and CITE-seq modalities.
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).