-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLily McMullen Mod3Assign1.Rmd
155 lines (109 loc) · 6.16 KB
/
Lily McMullen Mod3Assign1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "Module 3 Assignment 1"
author: 'Lily McMullen'
output: html_document
date: "2022-10-25"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Assignment Details
### Purpose
The goal of this assignment is to assess your ability to produce and interpret histograms and scatter plots using `ggplot2`.
### Task
Write R code (using `ggplot2`, specifically) which produces the correct answers and correctly interpret the plots produced.
### Criteria for Success
- Code is within the provided code chunks
- Code is commented with brief descriptions of what the code does
- Code chunks run without errors
- Code produces the correct result
- Code that produces the correct answer will receive full credit
- Code attempts with logical direction will receive partial credit
- Written answers address the questions in sufficient detail
### Due Date
November 8 at midnight MST
# Assignment Questions
For this assignment, we are going to be making plots! Specifically, we are going to be reproducing plots that we made in base R with `ggplot2`. If you want a refresher about the data and plots we are working with, take a gander at Module2_Assignment 1.
As before, we are going to use the data set called `penguins` from the `palmerpenguins` package.
Most of the code you will need to complete this assignment is code we used in the first 3 lessons of this module.
### Data
1. Load both the `palmerpenguins` package and the `tidyverse` package into the workspace. (2 points)
```{r}
library(tidyverse)
library(palmerpenguins)
```
When we use data from a data package, it doesn't automatically show up in our environment. Run this code chunk so it does show up in the environment.
```{r}
penguins <- penguins
```
2. Use the `head()` function to refresh yourself on the `penguins` data frame. (1 points)
```{r}
head(penguins)
```
### Histogram (or Density Plot)
3. Use `ggplot2` to make a histogram of the body mass column for all species combined (AKA don't set color or fill yet). You can create either a true histogram or a density plot, whichever you prefer. (2 points)
*Note: it will produce a warning saying it removed some rows. That's fine!*
```{r}
ggplot(penguins, aes(body_mass_g)) +
geom_density()
```
4. Now, let's produce the same plot (histogram or density plot of the body mass column) but instead of a plot of all species combined, set the `color` and `fill` arguments to be determined by the species of penguin. Change the transparency of the fill (`alpha`) so we can see all three species. (2 points)
(Hint: if you make a histogram, you'll want to set `position = "identity"` so you can see the histograms for all 3 species)
```{r}
ggplot(penguins, aes(body_mass_g, color = species, fill = species)) +
geom_density(alpha = 0.5) +
labs(color = "species",
fill="species") +
theme(legend.position = "top") +
theme(legend.title = element_text(colour="blue", size=10,
face="bold")) +
theme(legend.text = element_text(colour="blue", size=10,
face="bold")) +
theme(legend.background = element_rect(fill="lightblue",
size=0.5, linetype="solid")) +
theme(legend.background = element_rect(fill="lightblue",
size=0.5, linetype="solid",
colour ="darkblue"))
#I had some extra fun with theme commands :)
```
5. In 2-3 sentences, describe what the histogram is telling you. How are the two histograms from questions 3 and 4 different? What different bits of information can you glean from presenting the histogram these two different ways? I'm not necessarily looking for technical answers, but I want you to practice interpreting what histograms are telling you. (3 points)
The first density histogram is showing us the distribution of data for body mass weight over all three species of penguins. The second density histogram is showing us the distribution of data for body mass weight per each species. This allows us to see what the most common weight range is for each species of penguin.
*Answer:*
6. Let's spruce up the plot from question 4 a bit. (3 points)
Make the following changes:
- edit the x-axis and y-axis labels to be capitalized and easier to read
- capitalize the legend title ("Species" instead of "species")
- choose a pre-programmed theme for your plot; I recommend `theme_bw()` or `theme_classic()`, but you can choose whichever one you like, as long as the axes titles and legend remain!
```{r}
ggplot(penguins, aes(body_mass_g, color = species, fill = species)) +
geom_density(alpha = 0.5) +
labs(x="Body Mass (g)",
y="Density",
color = "Species",
fill="Species") +
theme(legend.position = "top") +
theme_classic()
```
### Scatter Plot
7. Make a scatter plot with flipper length on the x-axis (horizontal) and bill length on the y-axis (vertical) for all penguins, regardless of species. (2 points)
```{r}
ggplot(penguins, aes(flipper_length_mm, bill_length_mm)) +
geom_point()
```
8. Now, let's make this plot shine. Make the following edits. (3 point)
- show a different marker color depending on the penguin species.
- edit the x-axis and y-axis labels to be capitalized and easier to read
- capitalize the legend title ("Species" instead of "species")
- choose a pre-programmed theme for your plot; I recommend `theme_bw()` or `theme_classic()`, but you can choose whichever one you like, as long as the axes titles and legend remain!
```{r}
ggplot(penguins, aes(flipper_length_mm, bill_length_mm, color = species, fill = species)) +
geom_point() +
labs(x="Flipper Length (mm)",
y="Bill Length (mm)",
color = "Species",
fill="Species") +
theme_classic()
```
9. Write 1-2 sentences discussing why including color based on species is important, based on the two plots above. (2 points)
*Answer:*
Sorting our data distribution by color allows us to see a clear pattern between bill length and flipper length trends by species. For example, the Adelle species of penguin tends to have the shortest bill length and flipper length, and we would not be able to see this trend without color coding.