Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastQC still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after --trim_poly_g in fastp. #146

Closed
denvercal1234GitHub opened this issue Dec 14, 2024 · 1 comment

Comments

@denvercal1234GitHub
Copy link

Hi there,

Thanks for the tool.

I have some paired-end bulkRNAseq. I ran fastp as below with --trim_poly_g but in the FASTQC report for a sample, there was still shown issue with over-represented sequence of GGGGGGGG... in the R2 read. with "No Hits" in the Source.

Would you mind giving me some pointers to address this issue?

Thank you for your help.

Screenshot 2024-12-14 at 18 12 33

Screenshot 2024-12-14 at 18 12 50

Screenshot 2024-12-14 at 18 09 39

fastp \
    --adapter_fasta /ceph/project/borrowlab/qnguyen/RAW_bulkRNASeq_TMNCTFHTREGCD8_QNN2024Aug19/X204SC24072759-Z01-F001_02/01.RawData/for_Trimming.fasta \
    --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    --qualified_quality_phred 5 \
    --unqualified_percent_limit 50 \
    --n_base_limit 15 \
    --overlap_len_require 30 \
    --overlap_diff_limit 1 \
    --overlap_diff_percent_limit 10 \
    --length_required 150 \
    --length_limit 150 \
    --trim_poly_g \
    -i "$file" \
    -I "$r2_file" \
    -o "$output_r1" \
    -O "$output_r2"

Referenced at OpenGene/fastp#589

@s-andrews
Copy link
Owner

The overrepresented sequences in FastQC just search (by default) the first 50bp of each sequence. You have 150bp sequences.

When you trim polyG it will trim from the 3' end back.

The most likely explanation, assuming you've run fastp correctly, is that you have sequences which start with polyG but move into something else before the end of their 150bp length. If that's the case then neither program is doing anything wrong here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants