Skip to content

Latest commit

 

History

History
55 lines (36 loc) · 2.5 KB

File metadata and controls

55 lines (36 loc) · 2.5 KB

Bismark

Below is how to use Bismark for pre-processing bisulfite-sequencing data and what QC metrics to look for in the report files.

Creating a pre-converted genome

Bismark requires a pre-converted genome to perform the alignment against, not the standard reference sequence you'd use for other genomics assays.

If this has already been done, skip ahead to the next section. If not, run the following:

bismark_genome_preparation <path_to_genome_folder>

The path_to_genome_folder should contain the reference FASTA file that will be converted. This is typically named something like genome.fa.

Quality control

bismark_methylation_extractor generates a summary report with a few QC metrics. These metrics include:

  • % of CpG methylation
  • % of CHG methylation
  • % of CHH methylation
  • Duplication rates
  • M-bias

Percent methylation

In mammals, there is negligible non-CpG methylation in most cell types, neuronal cells being a well-known exception. In plants, CpA methylation is common and shouldn't be disregarded.

For mammals, non-CpG methylation is used as a measure of bisulfite conversion efficiency in the protocol. Most cell types appear to have 40-60% methylated Cs, and non-CpG methylation rates of < 2%.

Duplication rates

This is a global quality metric for sequencing data, but it is particularly present in bisulfite-sequencing data where molecules are restricted to 3 bases instead of 4. This can cause some issues with sequencing machines and library preparation, so generating duplicates is more commong, especially with restriction enzyme-based protocols like RRBS.

M-bias

This is a measure of whether methylation calls are truly accurate, given their placement within a read. A priori, there's no reason why Cs near the 5' end of a read should have a different average methylation rate than those at the 3' end, or anywhere else in a read. The summary report generated by Bismark produces an "M-bias plot", showing the average methylation at each position along the reads.

Completely unbiased data will have a flat line across the entire read. In some cases, there will be deviations from this flat line at the 5' and 3' ends. If this is the case, there are parameters in bismark_methylation_extractor to ignore bases at a given distance from the 5' or 3' ends when calculated read totals for each C.

References

A more thorough description of every facet of the tool can be found on GitHub.