Skip to content

Chromatin accessibility profiling with small molecule intercalation and long-read sequencing

Notifications You must be signed in to change notification settings

baigal628/smaddseq_manuscript

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SMAdd-seq: Small Molecule Adduct Sequencing

Probing Chromatin Accessibility with Small Molecule DNA Intercalation and Nanopore Sequencing

Installation

To install dependencies, use the following command:

conda install -f addseq.yml

Repository Contents

This repository contains scripts and data processing tools for analyzing chromatin accessibility using SMAdd-seq. Below is a breakdown of the directory structure:

  • ./scripts/bash/ - Bash scripts for data preprocessing, model training, and modification prediction using NEMO.
  • ./scripts/ - Jupyter notebooks for data processing and figure generation:
    • [0-7] - Notebooks for generating data output.
    • [Figure1-3] - Notebooks for generating figures in the manuscript.
    • [Supp_data] - Notebooks for generating supplemental data in the manuscript.
  • ./figures/ - Figures generated from the analysis.

Data Preprocessing Pipeline

  1. Basecalling Basecall raw sequencing data using Dorado:
~/dorado-0.7.3-linux-x64/bin/dorado basecaller \
    /private/groups/brookslab/gabai/tools/[email protected] chrom.pod5 \
    --emit-moves \
    --device cuda:all \
    --reference ./data/ref/sacCer3.fa > chrom.bam
  1. Quality Control (QC) Filter primary alignments, sort, and index BAM files using Samtools:
samtools view -b -F SECONDARY,SUPPLEMENTARY chrom.bam | samtools sort -o chrom_pass.sorted.bam
samtools index chrom_pass.sorted.bam
samtools stats chrom_pass.sorted.bam > chrom_pass.sorted.stats
  1. Event Alignment Align events to a reference genome using Uncalled4:
uncalled4 align \
    --ref ./data/ref/yst/sacCer3.fa \
    --reads chrom.pod5 \
    --bam-in chrom_pass.sorted.bam \
    -p 8 \
    --eventalign-out chrom_pass.sorted.tsv \
    --eventalign-flags print-read-names,signal-index,samples \
    --pore-model dna_r9.4.1_400bps_6mer

Notes

  • Ensure that Dorado and Samtools are properly installed before running the pipeline.
  • Modify the paths to match your data storage and reference genome locations.
  • The pipeline is optimized for Nanopore sequencing with R9.4.1 flow cells.

About

Chromatin accessibility profiling with small molecule intercalation and long-read sequencing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages