You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently seqkit seq filters by quality based on an average quality score. However, other tools such as FASTX's fastq_quality_filter, allow the user to select how many nucleotides (as a percentage) he wants to have a minimum PHRED score of X. Example:
We have 5 sequences that are 100 nucleotides long:
50 nucleotides have a phred score of 20.
50 nucleotides have a phred score of 40.
With an average phred score of 30, these sequences might be acceptable using seqkit seq --min-qual 30, but if we want to make sure that a low percentage of the nucleotides have a very low quality (let's say we only want 20% of nucleotides to be below a phred score of 30), all of these sequences would be discarded. This is currently not possible with seqkit but it is possible (albeit slower) with other tools.
Now, knowing how flexible and fast seqkit is, I would love to see this feature included!
The text was updated successfully, but these errors were encountered:
I am aware that fastp is capable of doing this, however I use seqkit for several steps and it would be great if this would also be a feature of seqkit.
About the average quality score, you are correct, the average score is ~23. Thanks for pointing it out!
Currently seqkit seq filters by quality based on an average quality score. However, other tools such as FASTX's fastq_quality_filter, allow the user to select how many nucleotides (as a percentage) he wants to have a minimum PHRED score of X. Example:
We have 5 sequences that are 100 nucleotides long:
With an average phred score of 30, these sequences might be acceptable using seqkit seq --min-qual 30, but if we want to make sure that a low percentage of the nucleotides have a very low quality (let's say we only want 20% of nucleotides to be below a phred score of 30), all of these sequences would be discarded. This is currently not possible with seqkit but it is possible (albeit slower) with other tools.
Now, knowing how flexible and fast seqkit is, I would love to see this feature included!
The text was updated successfully, but these errors were encountered: