`FastQC` still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after `--trim_poly_g` in `fastp`. #146

denvercal1234GitHub · 2024-12-14T18:17:58Z

Hi there,

Thanks for the tool.

I have some paired-end bulkRNAseq. I ran fastp as below with --trim_poly_g but in the FASTQC report for a sample, there was still shown issue with over-represented sequence of GGGGGGGG... in the R2 read. with "No Hits" in the Source.

Would you mind giving me some pointers to address this issue?

Thank you for your help.

fastp \
    --adapter_fasta /ceph/project/borrowlab/qnguyen/RAW_bulkRNASeq_TMNCTFHTREGCD8_QNN2024Aug19/X204SC24072759-Z01-F001_02/01.RawData/for_Trimming.fasta \
    --adapter_sequence AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    --adapter_sequence_r2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    --qualified_quality_phred 5 \
    --unqualified_percent_limit 50 \
    --n_base_limit 15 \
    --overlap_len_require 30 \
    --overlap_diff_limit 1 \
    --overlap_diff_percent_limit 10 \
    --length_required 150 \
    --length_limit 150 \
    --trim_poly_g \
    -i "$file" \
    -I "$r2_file" \
    -o "$output_r1" \
    -O "$output_r2"

Referenced at OpenGene/fastp#589

The text was updated successfully, but these errors were encountered:

s-andrews · 2024-12-16T12:11:46Z

The overrepresented sequences in FastQC just search (by default) the first 50bp of each sequence. You have 150bp sequences.

When you trim polyG it will trim from the 3' end back.

The most likely explanation, assuming you've run fastp correctly, is that you have sequences which start with polyG but move into something else before the end of their 150bp length. If that's the case then neither program is doing anything wrong here.

s-andrews closed this as completed Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`FastQC` still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after `--trim_poly_g` in `fastp`. #146

`FastQC` still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after `--trim_poly_g` in `fastp`. #146

denvercal1234GitHub commented Dec 14, 2024

s-andrews commented Dec 16, 2024

FastQC still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after --trim_poly_g in fastp. #146

FastQC still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after --trim_poly_g in fastp. #146

Comments

denvercal1234GitHub commented Dec 14, 2024

s-andrews commented Dec 16, 2024

`FastQC` still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after `--trim_poly_g` in `fastp`. #146

`FastQC` still showed "GGGGGGG" as over-represented sequence with "No Hit" Source even after `--trim_poly_g` in `fastp`. #146