Lily McMullen Mod3Assign1.Rmd

---
title: "Module 3 Assignment 1"
author: 'Lily McMullen'
output: html_document
date: "2022-10-25"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Assignment Details

### Purpose

The goal of this assignment is to assess your ability to produce and interpret histograms and scatter plots using `ggplot2`.

### Task

Write R code (using `ggplot2`, specifically) which produces the correct answers and correctly interpret the plots produced.

### Criteria for Success

-   Code is within the provided code chunks
-   Code is commented with brief descriptions of what the code does
-   Code chunks run without errors
-   Code produces the correct result
    -   Code that produces the correct answer will receive full credit
    -   Code attempts with logical direction will receive partial credit
-   Written answers address the questions in sufficient detail

### Due Date

November 8 at midnight MST

# Assignment Questions

For this assignment, we are going to be making plots! Specifically, we are going to be reproducing plots that we made in base R with `ggplot2`. If you want a refresher about the data and plots we are working with, take a gander at Module2_Assignment 1.

As before, we are going to use the data set called `penguins` from the `palmerpenguins` package.

Most of the code you will need to complete this assignment is code we used in the first 3 lessons of this module.

### Data

1.  Load both the `palmerpenguins` package and the `tidyverse` package into the workspace. (2 points)

```{r}
library(tidyverse)
library(palmerpenguins)
```

When we use data from a data package, it doesn't automatically show up in our environment. Run this code chunk so it does show up in the environment.

```{r}
penguins <- penguins
```

2.  Use the `head()` function to refresh yourself on the `penguins` data frame. (1 points)

```{r}
head(penguins)
```

### Histogram (or Density Plot)

3.  Use `ggplot2` to make a histogram of the body mass column for all species combined (AKA don't set color or fill yet). You can create either a true histogram or a density plot, whichever you prefer. (2 points)

    *Note: it will produce a warning saying it removed some rows. That's fine!*

```{r}
ggplot(penguins, aes(body_mass_g)) +
  geom_density()
```

4.  Now, let's produce the same plot (histogram or density plot of the body mass column) but instead of a plot of all species combined, set the `color` and `fill` arguments to be determined by the species of penguin. Change the transparency of the fill (`alpha`) so we can see all three species. (2 points)

    (Hint: if you make a histogram, you'll want to set `position = "identity"` so you can see the histograms for all 3 species)

```{r}
ggplot(penguins, aes(body_mass_g, color = species, fill = species)) +
  geom_density(alpha = 0.5) +
  labs(color = "species",
       fill="species") +
  theme(legend.position = "top") +
  theme(legend.title = element_text(colour="blue", size=10, 
                                      face="bold")) +
  theme(legend.text = element_text(colour="blue", size=10, 
                                     face="bold")) +
  theme(legend.background = element_rect(fill="lightblue", 
                                  size=0.5, linetype="solid")) +
  theme(legend.background = element_rect(fill="lightblue",
                                  size=0.5, linetype="solid", 
                                  colour ="darkblue"))


#I had some extra fun with theme commands :)
```

5.  In 2-3 sentences, describe what the histogram is telling you. How are the two histograms from questions 3 and 4 different? What different bits of information can you glean from presenting the histogram these two different ways? I'm not necessarily looking for technical answers, but I want you to practice interpreting what histograms are telling you. (3 points)

The first density histogram is showing us the distribution of data for body mass weight over all three species of penguins. The second density histogram is showing us the distribution of data for body mass weight per each species. This allows us to see what the most common weight range is for each species of penguin.

*Answer:*

6.  Let's spruce up the plot from question 4 a bit. (3 points)

Make the following changes:

-   edit the x-axis and y-axis labels to be capitalized and easier to read
-   capitalize the legend title ("Species" instead of "species")
-   choose a pre-programmed theme for your plot; I recommend `theme_bw()` or `theme_classic()`, but you can choose whichever one you like, as long as the axes titles and legend remain!

```{r}
ggplot(penguins, aes(body_mass_g, color = species, fill = species)) +
  geom_density(alpha = 0.5) +
  labs(x="Body Mass (g)", 
       y="Density",
       color = "Species",
       fill="Species") +
  theme(legend.position = "top") +
  theme_classic()
```

### Scatter Plot

7.  Make a scatter plot with flipper length on the x-axis (horizontal) and bill length on the y-axis (vertical) for all penguins, regardless of species. (2 points)

```{r}
ggplot(penguins, aes(flipper_length_mm, bill_length_mm)) +
  geom_point()
```

8.  Now, let's make this plot shine. Make the following edits. (3 point)

-   show a different marker color depending on the penguin species.
-   edit the x-axis and y-axis labels to be capitalized and easier to read
-   capitalize the legend title ("Species" instead of "species")
-   choose a pre-programmed theme for your plot; I recommend `theme_bw()` or `theme_classic()`, but you can choose whichever one you like, as long as the axes titles and legend remain!

```{r}
ggplot(penguins, aes(flipper_length_mm, bill_length_mm, color = species, fill = species)) +
  geom_point() +
  labs(x="Flipper Length (mm)", 
       y="Bill Length (mm)",
       color = "Species",
       fill="Species") +
  theme_classic()
```

9.  Write 1-2 sentences discussing why including color based on species is important, based on the two plots above. (2 points)

*Answer:*

Sorting our data distribution by color allows us to see a clear pattern between bill length and flipper length trends by species. For example, the Adelle species of penguin tends to have the shortest bill length and flipper length, and we would not be able to see this trend without color coding.