forked from samanthacsik/Intro-to-the-Tidyverse
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro_to_the_tidyverse_TEMPLATE.Rmd
162 lines (102 loc) · 3.14 KB
/
intro_to_the_tidyverse_TEMPLATE.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: '(user template) Intro to the Tidyverse: R-Ladies SB May 2019'
author: "Your Name Here"
date: "15 May 2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## **Don't have the tidyverse yet?**
### Install using the following code:
```{r, eval = FALSE}
install.packages("tidyverse")
```
### Load the tidyverse:
```{r, message = FALSE, warning = FALSE}
```
## **Data wrangling cheat sheet:**
Let's first create some completely hypothetical data about the number of pizzas eaten by Sam, An, Allison, Julie, and Jamie over the past 3 years :)
```{r}
pizza_data <- tribble(
~name, ~`2017`, ~`2018`, ~`2019`,
"Sam", 25, 20, 16,
"An", 20, 15, 11,
"Allison", 18, 17, 10,
"Julie", 19, 10, 14,
"Jamie", 21, 13, 14
)
```
It's a great habit to always familiarize/explore your data before starting to wrangle it:
```{r}
```
**`gather()`** transforms data from wide to long format
```{r}
```
**`spread()`** transform data from long to wide format
```{r}
```
From here on, we'll be working with our **tidy data** i.e. **`tidy_pizza`** to practice some useful wrangling functions.
### Subsetting data:
**`select()`** select columns to retain and specify their order
```{r}
```
**`filter()`** select observations within columns
```{r}
```
### Manipulating/adding variables:
**`arrange()`** order observations as specified (default = alphabetical or ascending)
```{r}
```
**`rename()`** rename a column
```{r}
```
**`mutate()`** a versatile function
```{r}
```
### Summarizing data:
**`group_by()`** groups observations such that data operations are performed at the level of the group
```{r}
```
**`summarize()`** calculate summary statistics
```{r}
```
**`tally()`** sum values across groups
```{r}
```
## **Now let's practice!**
### Load the tidyverse and any additional required packages:
```{r, message = FALSE, warning = FALSE}
```
### Load the data:
In celebration of this year's superbloom, we'll be exploring phenometric data of flowering California plants from the [USA -- National Phenology Network](https://www.usanpn.org/home).
```{r, message = FALSE, warning = TRUE}
```
Let's pretend we're trying to plan a getaway to the Joshua Tree National Park and want to time our trip so that we have the greatest chance of seeing fully bloomed plants.
### Explore:
We should first familiarize ourselves with the data.
```{r, eval = FALSE}
```
### Wrangle:
This dataset is *huge*--we'll want to wrangle it so that it only includes the information that we're interested in. We will:
a.
b.
c.
d.
e.
f.
g.
```{r}
```
With this simplified and cleaned data set, we're ready to explore a subset of the desert species we're most interested in. We love **Joshua trees** (*Yucca brevifolia*), **creosote bushes** (*Larrea tridentata*), and **Mojave yucca** (*Yucca schidigera*) and want to know when these plants are blooming. Let's first isolate data for these species by:
a.
b.
c.
```{r}
```
### Plot:
a.
b.
c.
```{r, fig.align = 'center', fig.width = 15, fig.height = 10}
```