Skip to content

Conversation

@mattheww95
Copy link
Collaborator

@mattheww95 mattheww95 commented Dec 4, 2025

This PR addresses:

STRY0017736 - Add field and logic for qc_status_read_length to mikrokondo

  • New QC metric for read length added, options for short and long read lengths are available.

STRY0018931 - Implement this as a passed/failed flag for qc_wgmlst_loci_count

  • Very straight forward, new qc metric added.

STRY0019400 - Punctuation in mikrokondo pass/fail messages

  • Re-organizes the final QCMessage as people were reading the "FAILED Species ID" to mean the species identification failed. This is the new format: "FAILED; Passed Tests: 5/6; Species ID: Escherichia coli; Organism QC Criteria: Escherichia coli"

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit e7a4eca

+| ✅ 238 tests passed       |+
#| ❔  20 tests were ignored |#
!| ❗  81 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.4.1
  • Run at 2025-12-12 19:17:16

@mattheww95 mattheww95 changed the title Add new QC metrics and Re-organize QC Message for clarity Add new QC metrics and re-organized QC Message for clarity Dec 4, 2025
@mattheww95 mattheww95 marked this pull request as ready for review December 5, 2025 14:01
max_length = 6000000 // The maximum genome length the organism in the search field is allowed to have
max_checkm_contamination = 3.0 // The maximum level of allowed contamination allowed by CheckM
min_average_coverage = 30 // The minimum average coverage allowed
min_wgmlst_loci: The minimum number of wgMLST loci required per a sample
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per sample

max_checkm_contamination = 3.0 // The maximum level of allowed contamination allowed by CheckM
min_average_coverage = 30 // The minimum average coverage allowed
min_wgmlst_loci: The minimum number of wgMLST loci required per a sample
min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize Illumina, "tolerate".
Probably reword to something like: "The lowest tolerable mean Illumina read length"

min_average_coverage = 30 // The minimum average coverage allowed
min_wgmlst_loci: The minimum number of wgMLST loci required per a sample
min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data
max_illumina_read_length: The highest mean illumina read length you can tolerate for your sample
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

min_wgmlst_loci: The minimum number of wgMLST loci required per a sample
min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data
max_illumina_read_length: The highest mean illumina read length you can tolerate for your sample
min_long_read_length: The minimum mean read length allowed for long read data like pacbio and nanopore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PacBio, Nanopore

reisolate = 1
resequence = 1
failed_p = true
failed_p = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is failed_p "failed probability"? Otherwise, can the p be changed to something more descriptive.

Comment on lines 470 to +471
def vals = [qc_data[fields[0]], qc_data[fields[1]]].sort()
if(vals[0] == null){
if(vals[0] == null || vals[1] == null){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could vals have a more descriptive name?

"meta": {
"nf-test": "0.9.2",
"nextflow": "25.04.8"
"nextflow": "25.04.7"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing out downgrade here.

Comment on lines +767 to +770
min_wgmlst_loci: The minimum number of wgMLST loci required per a sample
min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data
max_illumina_read_length: The highest mean illumina read length you can tolerate for your sample
min_long_read_length: The minimum mean read length allowed for long read data like pacbio and nanopore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to comments above (near the top) about these.

max_length =
max_checkm_contamination = 1.0
average_coverage =
min_average_coverage =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing after the equals sign... is that expected?

Comment on lines +1260 to +1261
low_msg = "Combined mean Illumina read length is lower than expected."
high_msg = "Combined mean Illumina read length is much higher than expected."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low says "lower", high says "much higher", I wonder if these should be symmetric, or is it intended?

Copy link
Contributor

@sgsutcliffe sgsutcliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, nothing really to add, I just wanted to be a part of the PR. Looks good.

max_length = null
max_checkm_contamination = 3.0
min_average_coverage = 30
min_illumina_read_length = 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to a be more loose on the max/min? Or are these coming from somewhere specific?

min_average_coverage = 40
min_wgmlst_loci = 3800
min_illumina_read_length = 120
max_illumina_read_length = 310
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some quick googling 300 is the max, but I still like 310 vs 300 but maybe not, but I am more concerned about a minimum of 120 because something like NovaSeq 6000 can be under 100bp or is NovaSeq not used for metagenomics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants