Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seqkit seq could filter by a %of nucleotides above a specific quality threshold (both user-defined) #472

Open
MiguelHK opened this issue Jun 27, 2024 · 2 comments

Comments

@MiguelHK
Copy link

Currently seqkit seq filters by quality based on an average quality score. However, other tools such as FASTX's fastq_quality_filter, allow the user to select how many nucleotides (as a percentage) he wants to have a minimum PHRED score of X. Example:

We have 5 sequences that are 100 nucleotides long:

  • 50 nucleotides have a phred score of 20.
  • 50 nucleotides have a phred score of 40.

With an average phred score of 30, these sequences might be acceptable using seqkit seq --min-qual 30, but if we want to make sure that a low percentage of the nucleotides have a very low quality (let's say we only want 20% of nucleotides to be below a phred score of 30), all of these sequences would be discarded. This is currently not possible with seqkit but it is possible (albeit slower) with other tools.

Now, knowing how flexible and fast seqkit is, I would love to see this feature included!

@shenwei356
Copy link
Owner

I'd recommend fastp, which supports this. look here: https://github.com/OpenGene/fastp?tab=readme-ov-file#quality-filter , maybe you can use -q 30 -u 20.

BTW, a read with 50 bp with score 20 and 50 bp with score 40, the average quality score is not 30.

@MiguelHK
Copy link
Author

I am aware that fastp is capable of doing this, however I use seqkit for several steps and it would be great if this would also be a feature of seqkit.

About the average quality score, you are correct, the average score is ~23. Thanks for pointing it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants