R-loop-project

The project is from my biostatistic class. The goal is to use statistic analysis in R programming to evaluate published data set.

Objective

Transcription is an important process for producing RNA from DNA and also a source of genomic instability and DNA damage. DNA and RNA hybrid structure is one of the major reasons for it. In yeast, only 283 out of 6000 genes contain introns. The function of introns was not clear until 2017. Bonnect et al. have published a study in Molecular Cell journal indicating a novel function of introns in decreasing R-loop and preventing genomic instability. Here I am eager to confirm their conclusion and predict that high transcription rate and intron-containing genes have critical effects on R-loop formation.

Data collection summary

My samples are from a published paper in Molecular Cell journal (Supplemental table S1 in the following source information) and the subjects are different yeast genes. The final sample size is 311 genes after I removed 5464 yeast genes without response variables, 98 outliers, and 35 genes without transcription frequency (not determined, n.d.) from the original dataset (5908 genes) (I will talk about the removal of 98 outliers and 35 genes without transcription frequency in next section). Removing genes without response variables helps my analysis focus on potential genes in R-loop formation.

Research questions

Q1: Is there an effect of gene transcription rate on R-loop formation?
Q2: Do intron-containing genes have less R-loop formation?

Analysis Method

I used GLM model and graphing function in R programming to analyze my data. The functions that I used for GLM model are lm() and summary().

Assumptions

Normality
Linearity
Equal variance

Reference

Amandine Bonnet, Ana R. Grosso, Abdessamad Elkaoutari, Emeline Coleno, Adrien Presle, Sreerama C. Sridhara, Guilhem Janbon, Vincent Ge´ li, Se´rgio F. de Almeida, and Benoit Palancade., 2017, Molecular Cell 67, 608-621.(http://www.cell.com/molecular-cell/abstract/S1097-2765(17)30496-3.)
Lamia Wahba et al., S1-DRIP-seq identifies high expression and polyA tracts as major contributors to R-loop formation., 2016, Genes & Development (http://genesdev.cshlp.org/content/30/11/1327)
Frank C.P Holstege et al., Dissecting the Regulatory Circuitry of a Eukaryotic Genome., 1998, Cell (http://www.sciencedirect.com/science/article/pii/S0092867400816414)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
CodeBook.md		CodeBook.md
Data for project1.csv		Data for project1.csv
README.md		README.md
Results.md		Results.md
RunAnalysis.R		RunAnalysis.R
Table S1.xls		Table S1.xls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R-loop-project

Objective

Data collection summary

Research questions

Analysis Method

Assumptions

Reference

About

Releases

Packages

Languages

garyk630/R-loop-project

Folders and files

Latest commit

History

Repository files navigation

R-loop-project

Objective

Data collection summary

Research questions

Analysis Method

Assumptions

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages