-
Notifications
You must be signed in to change notification settings - Fork 4
Add new QC metrics and re-organized QC Message for clarity #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: QCMessageUpdate
Are you sure you want to change the base?
Conversation
|
| max_length = 6000000 // The maximum genome length the organism in the search field is allowed to have | ||
| max_checkm_contamination = 3.0 // The maximum level of allowed contamination allowed by CheckM | ||
| min_average_coverage = 30 // The minimum average coverage allowed | ||
| min_wgmlst_loci: The minimum number of wgMLST loci required per a sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per sample
| max_checkm_contamination = 3.0 // The maximum level of allowed contamination allowed by CheckM | ||
| min_average_coverage = 30 // The minimum average coverage allowed | ||
| min_wgmlst_loci: The minimum number of wgMLST loci required per a sample | ||
| min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capitalize Illumina, "tolerate".
Probably reword to something like: "The lowest tolerable mean Illumina read length"
| min_average_coverage = 30 // The minimum average coverage allowed | ||
| min_wgmlst_loci: The minimum number of wgMLST loci required per a sample | ||
| min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data | ||
| max_illumina_read_length: The highest mean illumina read length you can tolerate for your sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
| min_wgmlst_loci: The minimum number of wgMLST loci required per a sample | ||
| min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data | ||
| max_illumina_read_length: The highest mean illumina read length you can tolerate for your sample | ||
| min_long_read_length: The minimum mean read length allowed for long read data like pacbio and nanopore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PacBio, Nanopore
| reisolate = 1 | ||
| resequence = 1 | ||
| failed_p = true | ||
| failed_p = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is failed_p "failed probability"? Otherwise, can the p be changed to something more descriptive.
| def vals = [qc_data[fields[0]], qc_data[fields[1]]].sort() | ||
| if(vals[0] == null){ | ||
| if(vals[0] == null || vals[1] == null){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could vals have a more descriptive name?
| "meta": { | ||
| "nf-test": "0.9.2", | ||
| "nextflow": "25.04.8" | ||
| "nextflow": "25.04.7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing out downgrade here.
| min_wgmlst_loci: The minimum number of wgMLST loci required per a sample | ||
| min_illumina_read_length: The lowest mean illumina read length you can tolerated for your data | ||
| max_illumina_read_length: The highest mean illumina read length you can tolerate for your sample | ||
| min_long_read_length: The minimum mean read length allowed for long read data like pacbio and nanopore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to comments above (near the top) about these.
| max_length = | ||
| max_checkm_contamination = 1.0 | ||
| average_coverage = | ||
| min_average_coverage = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's nothing after the equals sign... is that expected?
| low_msg = "Combined mean Illumina read length is lower than expected." | ||
| high_msg = "Combined mean Illumina read length is much higher than expected." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
low says "lower", high says "much higher", I wonder if these should be symmetric, or is it intended?
sgsutcliffe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, nothing really to add, I just wanted to be a part of the PR. Looks good.
| max_length = null | ||
| max_checkm_contamination = 3.0 | ||
| min_average_coverage = 30 | ||
| min_illumina_read_length = 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to a be more loose on the max/min? Or are these coming from somewhere specific?
| min_average_coverage = 40 | ||
| min_wgmlst_loci = 3800 | ||
| min_illumina_read_length = 120 | ||
| max_illumina_read_length = 310 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some quick googling 300 is the max, but I still like 310 vs 300 but maybe not, but I am more concerned about a minimum of 120 because something like NovaSeq 6000 can be under 100bp or is NovaSeq not used for metagenomics?
This PR addresses:
STRY0017736 - Add field and logic for
qc_status_read_lengthto mikrokondoSTRY0018931 - Implement this as a passed/failed flag for
qc_wgmlst_loci_countSTRY0019400 - Punctuation in mikrokondo pass/fail messages
"FAILED; Passed Tests: 5/6; Species ID: Escherichia coli; Organism QC Criteria: Escherichia coli"