This repository contains a shell script for running a complete NGS data analysis pipeline using several bioinformatics tools, including SRA Toolkit, Trimmomatic, FastQC, STAR, and featureCounts. The pipeline performs the following steps:
- Data Download: Downloads NGS data from SRA.
- Quality Control: Performs quality control on raw FASTQ files using FastQC.
- Trimming: Trims adapters and low-quality bases from the reads using Trimmomatic.
- Mapping: Maps the trimmed reads to a reference genome using STAR.
- Feature Counting: Counts the features using featureCounts.
Before running the script, ensure that the following tools are installed on your system:
The script generates several output files, including:
FASTQ files (SRR8986990_1.fastq, SRR8986990_2.fastq)
FastQC reports (SRR8986990_1_fastqc.html, SRR8986990_2_fastqc.html)
Trimmed FASTQ files (SRR8986990_1_trimmed_paired.fastq, SRR8986990_2_trimmed_paired.fastq)
BAM file (mappingAligned.sortedByCoord.out.bam)
Feature counts file (counts_file.txt)
Ensure that the paths to the reference genome and GTF files are correct.
Modify the script as necessary to fit your specific data and analysis needs.