Skip to content

Commit 4a29c91

Browse files
author
Ryan Williams
committed
added to visualization lesson
1 parent 8b99a47 commit 4a29c91

File tree

2 files changed

+39
-0
lines changed

2 files changed

+39
-0
lines changed

R_visualization.md

+2
Original file line numberDiff line numberDiff line change
@@ -74,4 +74,6 @@ ggplot(dataset) # note the error
7474
ggplot(dataset)+geom_boxplot(aes(x=categorical_variable, y= variable))
7575
```
7676

77+
With this plot we can see the distributions of data (quantiles and median) categorized by our categorical variable. While this visualization can show you how your data is distributed (is it skewed?), you can also begin comparing between categorical variables (is my variable greater under one category than another)?
78+
7779

lessons/visualizing_data.md

+37
Original file line numberDiff line numberDiff line change
@@ -1 +1,38 @@
11
# This is a lesson for how to visualize your data
2+
3+
*The text below is all rough, and much of it is just a start
4+
5+
First we are going to input data
6+
7+
```
8+
dataset<-read.table("file/name/here",sep=...,header=T)
9+
```
10+
11+
There are several differerent summary statistics that we can run
12+
13+
```
14+
mean(dataset$variable)
15+
sd(dataset$variable)
16+
quantile(dataset$variable, c(0.025,0.975))
17+
```
18+
19+
These statistics desribe how a particular variable is distributed, but we may have this variable from several genomes, and we would want to know how the distribution may differ. To do this we can use the `ddply()` function from the `"plyr"` package.
20+
```
21+
library(plyr)
22+
ddply(dataset, .(categorical_variable), summarise,
23+
mean=mean(variable),
24+
sd=sd(variable),
25+
hi_95=quantile(variable, 0.975),
26+
lo_95=quantile(variable, 0.025))
27+
```
28+
29+
To start plotting this we will use the ggplot2() package. We will start with a blank plot and add aesthetic layers to it.
30+
31+
```
32+
ggplot(dataset) # note the error
33+
ggplot(dataset)+geom_boxplot(aes(x=categorical_variable, y= variable))
34+
```
35+
36+
With this plot we can see the distributions of data (quantiles and median) categorized by our categorical variable. While this visualization can show you how your data is distributed (is it skewed?), you can also begin comparing between categorical variables (is my variable greater under one category than another)?
37+
38+

0 commit comments

Comments
 (0)