Skip to content

Commit a1fa299

Browse files
committed
Merge branch 'master' of https://github.com/nservant/HiC-Pro
2 parents bca4634 + fd23860 commit a1fa299

File tree

3 files changed

+17
-3
lines changed

3 files changed

+17
-3
lines changed

README.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Bowtie >2.2.2 is strongly recommanded for allele specific analysis.
5858
+---------------+-----------------------------------------------------------------------------+
5959
| SYSTEM CONFIGURATION |
6060
+===============+=============================================================================+
61-
| PREFIX | Path to installation folder |
61+
| PREFIX | Path to installation folder |
6262
+---------------+-----------------------------------------------------------------------------+
6363
| BOWTIE2_PATH | Full path the bowtie2 installation directory |
6464
+---------------+-----------------------------------------------------------------------------+

doc/MANUAL.rst

+10-1
Original file line numberDiff line numberDiff line change
@@ -238,13 +238,22 @@ See the :ref:`results <RES>` section for more information.
238238

239239
* *bowtie_results*
240240

241-
The *bowtie_results* folder contains the results of the reads mapping. The results of first mapping step are available in the *bwt2_glob* folder, and the seconnd step in the *bwt2_loc* folder. Final BAM files, reads pairing, and mapping statistics are available on the *bwt2* folder.
241+
The *bowtie_results* folder contains the results of the reads mapping. The results of first mapping step are available in the *bwt2_glob* folder, and the seconnd step in the *bwt2_loc* folder. Final BAM files, reads pairing, and mapping statistics are available on the *bwt2* folder. Note that once HiC-Pro has been run, all files in *bwt2_glob* or *bwt2_loc* folders can be removed. These files take a significant amount of disk space and are not useful anymore.
242242

243243
* *hic_results*
244244

245245
| This folder contains all Hi-C processed data, and is further divided in several sub-folders.
246246
| The *data* folder is used to store the valid interaction products (*.validPairs*), as well as other statisics files.
247+
248+
| The *validPairs* are stored using a simple tab-delimited text format ;
249+
| read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size [/ allele_specific_tag]
250+
| One *validPairs* file is generated per reads chunck. These files are then merged in the *allValidPairs*, and duplicates are removed if specified in the configuration file.
251+
247252
| The contact maps are then available in the *matrix* folder. The *matrix* folder is organized with *raw* and *iced* contact maps for all resolutions.
253+
| Contact maps are stored as a triplet sparse format ;
254+
| bin_i / bin_j / counts_ij
255+
| Only no zero values are stored. BED file described the genomic bins are also generated. Note that *abs* and *ord* files are identical in the context of Hi-C data as the contact maps are symmetric.
256+
248257
| Finally, the *pic* folder contains graphical outputs of the quality control checks.
249258
250259

doc/RESULTS.rst

+6-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,12 @@ List of valid interaction products
3030
----------------------------------
3131

3232
| The *hic_results* folder contains all Hi-C processed data, and is further divided in several sub-folders.
33+
3334
| The *hic_results/data* folder is used to store the valid interaction products (*'.validPairs'*), as well as other statisics files.
35+
| The *validPairs* are stored using a simple tab-delimited text format ;
36+
| read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size [/ allele_specific_tag]
37+
| One *validPairs* file is generated per reads chunck. These files are then merged in the *allValidPairs*, and duplicates are removed if specified in the configuration file.
38+
3439
| Statistics about read pairs filtering are available in the *'.RSstat'* files, and combined in the *'SAMPLE_NAME.mRSstat'* file.
3540
| The ligation efficiency can be assessed using the filtering of valid and invalid pairs. As the ligation is a random process, 25% of each valid ligation class is expected. In the same way, a high level of dangling-end or self-circle read pairs is associated with a low quality experiment, and reveals a problem during the digestion, fill-in or ligation steps.
3641
| In the context of Hi-C protocol without restriction enzyme, this analysis step is skipped. The aligned pairs are therefore directly used to generate the contact maps. A filter of the short range contact (typically <1kb) is recommanded as this pairs are likely to be self ligation products.
@@ -69,7 +74,7 @@ Intra and inter-chromosomal contact maps
6974
(...)
7075

7176

72-
This format is memory efficient, and is compatible with other analysis softwares such as the `HiTC Bioconductor package <http://bioconductor.org/packages/release/bioc/html/HiTC.html>`_.
77+
This format is memory efficient, and is compatible with other analysis softwares such as the `HiTC Bioconductor package <http://bioconductor.org/packages/release/bioc/html/HiTC.html>`_ or the `HiCPlotter software <https://github.com/kcakdemir/HiCPlotter>`_.
7378

7479

7580

0 commit comments

Comments
 (0)