The conceptual innovation of DeepLoop is to handle systematic biases and random noises separately: we used HiCorr to improve the rigor of bias correction, and then applied deep-learning techniques for noise reduction and loop signal enhancement. DeepLoop significantly improves the sensitivity, robustness, and quantitation of Hi-C loop analyses, and can be used to reanalyze most published low-depth Hi-C datasets.
- Zhang,S. and Plummer,D. et al._ Robust mapping of DNA loops at kilobase resolution from low depth allele-resolved or single-cell Hi-C data (under review)
40 Processed Hi-C datasets by DeepLoop can be visualized in website
wget --no-check-certificate https://hiview.case.edu/ssz20/tmp.HiCorr.ref/DeepLoop_top300K.tar.gz
DeepLoop require HiCorr, please install HiCorr first
DeepLoop was developed and tested using Python 3.5 and following Python packages:
numpy
scipy
pandas
matplotlib
opencv-python
tensorflow
The packages can be installed by running the following command:
pip3 install -r requirements.txt
This will also install optional visualization and analysis tools we use such as:
cooler
jupyter
higlass-python
If you plan on training your own model you will want to use a GPU enabled version of TensorFlow to intractably long training times. We used tensorflow-gpu==2.3.1
but any TF2 version should work. For prediction GPU is not necessary but it will be faster than using CPU.
We have trained a series of LoopEnhance models(depth from 100k to 250M mid-range contacts)trained with human cortex Hi-C data We also trained LoopDenoise model with human cortex and H9 cell line data separately. You can download them by:
cd DeepLoop/
wget --no-check-certificate https://hiview.case.edu/ssz20/tmp.HiCorr.ref/DeepLoop_models.tar.gz
tar -xvf DeepLoop_models.tar.gz
After decompressing, the "DeepLoop_models/" dircetory includes "CPGZ_trained", "H9_trained" models and "ref" which includes anchor bed files for HiCorr output.
-
DeepLoop are trained with HiCorr output, we have several tutorials to show how to process raw Hi-C data through HiCorr and DeepLoop staring from fastq-files, bam files or "validPairs" from HiC-Pro. See Hi-C data preprocessing
-
HiCorr is a fragment-based bias correction method. We highly recommend that users run HiCorr with fragment pairs instead of bin pairs
The format of DeepLoop input files is fragment/anchor based contact pairs from each chromosome:
anchor_id_1 | anchor_id_2 | observed_reads_count | expected_reads_from_HiCorr |
anchor_id_1 | anchor_id_2 | LoopStrength_from_DeepLoop |
- Step1: Download example data to repeat the following process and plot
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/HiCorr_output.tar.gz
tar -xvf HiCorr_output.tar.gz
ls
ls HiCorr_output
You will see "anchor_2_anchor.loop.chr11" and "anchor_2_anchor.loop.chr11.p_val" in "HiCorr_output/"
- Step2: Run DeepLoop (LoopDenoise) based on directory "HiCorr_output/"
HiCorr_path=<Path to HiCorr_output>
DeepLoop_outPath=
chr=chr11
python3 DeepLoop/prediction/predict_chromosome.py --full_matrix_dir $HiCorr_path/ \
--input_name anchor_2_anchor.loop.$chr.p_val \
--h5_file DeepLoop/DeepLoop_models/CPGZ_trained/LoopDenoise.h5 \
--out_dir $DeepLoop_outPath/ \
--anchor_dir DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/ \
--chromosome $chr \
--small_matrix_size 128 \
--step_size 128 \
--dummy 5 \
--val_cols obs exp pval
Check output in $DeepLoop_outPath
ls $DeepLoop_outPath
head $DeepLoop_outPath/$chr.denoised.anchor.to.anchor
You will see "chr22.denoised.anchor.to.anchor"
anchor_id_1 | anchor_id_2 | LoopStrength_from_DeepLoop |
- Step3: Visulaize contact heatmaps from raw, HiCorr, and DeepLoop given a genomic location chr start end
chr=chr11
start=130000000
end=130800000
outplot="./test"
./DeepLoop/lib/generate.matrix.from_HiCorr.pl DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/$chr.bed $HiCorr_path/anchor_2_anchor.loop.$chr $chr $start $end ./${chr}_${start}_${end}
./DeepLoop/lib/generate.matrix.from_DeepLoop.pl DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/$chr.bed $DeepLoop_outPath/$chr.denoised.anchor.to.anchor $chr $start $end ./${chr}_${start}_${end}
./DeepLoop/lib/plot.multiple.r $outplot 1 3 ${chr}_${start}_${end}.raw.matrix ${chr}_${start}_${end}.ratio.matrix ${chr}_${start}_${end}.denoise.matrix
https://github.com/JinLabBioinfo/DeepLoop/blob/master/images/test.plot.png
Check the "test.plot.png", "raw", "HiCorr", and "DeepLoop"
Note:
- Change DeepLoop model according to the data depth you have
- To run DeepLoop for whole genome, repeat the process above for each chromosome.
- The heatmap color scale can be adjusted in the script "lib/plot.multiple.r"
The heatmap visualization in Step3 above can be also done with script "plot.sh" in "lib/"
It takes eight parameters:
- DeepLoopInstallPath: Path to "DeepLoop/"
- DeepLoopRefbed: Path to anchor bed, e.g. "DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/" in the test exmaples
- HiCorr_path: Path for HiCorr_output/.
- DeepLoop_outPath: Path for DeepLoop output; where you store "chr*.denoised.anchor.to.anchor"
- chr: Genomic region chromosome
- start: Genomic region start loc
- end: Genomic region end loc, remember input a region less than 2Mb
- outPath: Path to store the heatmap png files.
If DeepLoop is installed in home directory "$myhome", outPath is current directory("./") you plan to run the script
bash $myhome/DeepLoop/lib/plot.sh $myhome \
$myhome/DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/ \
$HiCorr_path/ \
$DeepLoop_outPath/ \
chr11 130000000 130800000 ./
The heatmap png file named "chr11_130000000_130800000.plot.png" will be in the current directory.
- As we mentioned in section Hi-C data Preprocessing, HiCorr can take HiC-Pro output.
- The output of HiCorr and Deeploop can be converted to cooler format, see cooler walkthrough notebook
- You can further take converted cooler file to visulaize by HiGlass
If you wish to train a new model, ensure you have access to a machine with a GPU and refer to the training walkthrough notebook
DeepLoop is able to generate clean loop signals, we We will merge DeepLoop output from all the chromosomes and rank anchor pairs by "LoopStrength"(3rd column). Take confident loops from top ranked contact pairs.