You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With this plot we can see the distributions of data (quantiles and median) categorized by our categorical variable. While this visualization can show you how your data is distributed (is it skewed?), you can also begin comparing between categorical variables (is my variable greater under one category than another)?
78
78
79
79
80
+
Applying Visualization to a Dataset
81
+
-----------------------------------
82
+
83
+
First we will install the following packages
84
+
```
85
+
install.packages("ggplot2")
86
+
install.packages("reshape2")
87
+
install.packages("plyr")
88
+
install.pacakges("vegan")
89
+
90
+
library(reshape2)
91
+
library(plyr)
92
+
library(ggplot2)
93
+
library(vegan)
94
+
```
95
+
These packages include many visualization, statistical, and data management tools that can be used to summarize your data and produce publication-ready plots.
96
+
97
+
First we will read in the data (this will be at the github repo too), we will remove some nonsenical labels for ease of visualization here. We will also add a variable that allows us to count up SNPs in the database
This dataset displays SNPs among multiple E.coli strains. We are interested in looking at how many total SNPs there are per E.coli genome. We will do this using ddply.
p.error.bars+labs(x="Average Number of SNPs",y="E. Coli Genomes")
164
+
165
+
```
166
+
167
+
We may also be interested in a multivariate analysis of these data. We can ask the question, "Which genomes are most similar based on SNPs?" This is analogous to saying, "Which of my samples are the most similar?" We will do this using nonmetric multidimensional scaling or NMDS.
168
+
169
+
First we will transform data to presence-absence
170
+
171
+
```
172
+
library(vegan)
173
+
transformed_data<-decostand(dataset[,-1],"pa")
174
+
```
175
+
176
+
Then we will produce the NMDS and look at one of the outputs
This visualization shows where points are in multidimensional space in relation to one another compressed down into a 2D form. Points that are very close together are very similar while points that are very far apart are dissimilar.
0 commit comments