Lily McMullen Module2_Assignment3.Rmd

---
title: "Module 2, Assignment 3"
author: "Ellen Bledsoe" 'Lily McMullen'
date: "2022-10-04"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Assignment Details

### Purpose

The goal of this assignment is to practice developing successful workflows to get from the starting data to a plot.

### Task

Write R code which produces the correct answers and reflect on workflow.

### Criteria for Success

-   Code is within the provided code chunks
-   Code is commented with brief descriptions of what the code does
-   Code chunks run without errors
-   Code produces the correct result
    -   Code that produces the correct answer will receive full credit
    -   Code attempts with logical direction will receive partial credit
-   Written answers address the questions in sufficient detail

### Due Date

October 11 at midnight MST

# Assignment Questions

Our goal for this assignment is to start with a data frame of data, summarize the data in constructive ways, and plot the data to answer some questions. To get from point a to point b to point c requires some planning.

In this assignment, you will make a plan, we'll execute how we would actually go about the process, and then you will evaluate how well your original plan matches with the path we actually used.

## Data Summary

The aquaculture scientists on Team Antarctica have been working on developing a new diet for tilapia based on soy-protein, and they are interested in whether incorporating this into fish diets will result in faster growth rates. They also want to know if those growth rates are related to average tank temperature. They've provided us with this data to analyze in the "tilapia_growth.csv" file.

First, let's take a look at the data we will be using for the assignment.

1.  As always, we start by loading the `tidyverse`. (1 point)

```{r}
library(tidyverse)
```

2.  Next, we need to load in our tilapia growth dataset (use the `tidyverse` function, `read_csv`, to do this). Call the data frame `growth`. Then use both the `head` and `tail` functions to take a look at the data. (1 point)

```{r}
growth <- tilapia_growth
head(growth)
tail(growth)
print(growth)
```

3.  Before we can plan any strategy, we need to understand the data that we have. Answer the following questions about the data. (0.5 points each, 2 points total)

    a\. What does one row represent? (One tank? One temperature? One treatment? One fish?)

    One row represents one fish.

    b\. How many tanks of fish were sampled?

    16 tanks of fish were sampled.

    c\. How many fish were sampled?

    320 fish were sampled.

    d\. How many different "percent soy protein" levels (treatments) are there?

    Four different "percent soy protein" levels: 0.2%, 0.4%, 0.6% and 0.8%.

## Our Task

We have been asked by our aquaculture specialists to provide them with the following:

a.  a data frame with the average growth (`day_30_weight`) per treatment (`perc_soy_protein`)
b.  a scatter plot that shows the relationship (or lack thereof) between percent soy protein in the diet and the weight at 30 days
c.  a data frame with the average growth per treatment (`perc_soy_protein`) *and* if tanks are warm or cold
d.  a scatter plot that show the relationship (or lack thereof) between average tank temperature and the weight at 30 days

They've also asked that we provide all weights in kilograms instead of grams (the `day_30_weight` is currently in grams).

## Prediction

4.  Spend some time thinking about each one of these steps. What steps will you need to take to produce the end result? What data frame will you use? What functions will you use?

    For each of the 4 tasks listed above, *describe* (do NOT code) how you will get from the starting point (a data frame) to the result (another data frame or a plot).

    This question will be grade ***only on completion***, not on whether or not your plan is *correct*. (2 points)

    *Task (a)*: Create a new data frame and group by perc_soy_protein, then find the average day 30 weight.

    *Task (b)*: Create a scatter plot with per_soy_protein on the x-axis and day 30 weight on the y-axis.

    *Task (c)*: Create a new column for tank category. Write a for loop and an if else statement with conditions: if the average tank temp is above or equal to 75 degrees, it is classified warm, and if it is below it is classified cold.

    *Task (d)*: create a scatter plot with avg tank temperature vs weight at 30 days from the growth data frame.

## Execution

Now let's actually go ahead and complete our tasks with code.

Because the aquaculture team has asked for everything to use kilograms instead of grams, it makes sense for us to add a `day_30_weight_kg` column before we tackle any of the specific tasks.

5.  Add a `day_30_weight_kg` column to the growth data frame. You can do this however you want, but using the `mutate()` column is likely the easiest route. (Hint: divide grams by 1000 to get kg)

    Remember to "overwrite" the growth data frame (use the assignment operator) so the new column is permanently added to the `growth` data frame for us to use in the rest of the assignment. (2 points)

```{r}
g_to_kg <- function(g = NULL){
  kg <- (g/1000)
  return(kg)
}

g_to_kg(5)

growth <- growth %>%
  mutate(day_30_weight_kg = g_to_kg(day_30_weight))

head(growth)
```

### Task (a)

**A data frame with the average growth (`day_30_weight_kg`) per treatment (`perc_soy_protein`)**

6.  Create a new data frame called `growth_by_treatment`. We first want to group the data by treatment (`perc_soy_protein`) then calculate the average weight in kg. (2 points)

```{r}
growth_by_treatment <- growth %>%
  group_by(perc_soy_protein) %>%
  summarize(mean_weight_kg = mean(day_30_weight_kg))

print(growth_by_treatment)

```

### Task (b)

**A scatter plot that shows the relationship (or lack thereof) between percent soy protein in the diet and the weight at 30 days**

Based on the values we calculated in task (a), it looks like there is probably a positive relationship between the percent of soy protein in tilapia diet and growth. Let's plot the data to confirm.

7.  Make a scatter plot with `perc_soy_protein` on the x-axis (horizontal) and `day_30_weight_kg` on the y-axis (vertical). Change the axis labels to be more easily understood. (1 point)

    (Hint: because we want to plot *all* of the values, not just the mean values, we need to use the original `growth` data frame)

```{r}
plot(x = growth$perc_soy_protein, y = growth$day_30_weight_kg, 
     xlab = "Percentage of Soy Protein", 
     ylab = "Day 30 Weight (kg)")
```

### Task (c)

**A data frame with the average growth per treatment (`perc_soy_protein`) *and* if tanks are warm or cold**

8.  Before we can calculate averages based on whether tanks are warm or cold, let's create a new column called `tank_category` in the `growth` data frame. If the average tank temp is equal to or above 75 degrees, we want "warm" in the `tank_category` column; if it is below, we want "cold". (2 points)

    You can accomplish this task however you would like (`if_else` function in a `mutate` function or a `for loop`).

    Again, remember to use the assignment arrow to "overwrite" the `growth` data frame so we can save this new column to use later.

```{r}

growth <- growth %>%
  mutate(tank_category = if_else(condition = growth$avg_tank_temp >= 75,
                                 true = "warm",
                                 false = "cold"))

print(growth)
```

9.  Now that we have our `tank_category` column, we can use it in our `group_by` function, along with `perc_soy_protein`, to calculate the average `day_30_weight_kg` for warm and cold tanks in each treatment. Call this new data frame `growth_by_temp_treatment`. (2 points)

```{r}

growth_by_temp_treatment <- growth %>%
  group_by(tank_category, perc_soy_protein) %>%
  summarize(mean_weight_kg = mean(day_30_weight_kg)) %>%


print(growth_by_treatment)
```

### Task (d)

**A scatter plot that show the relationship (or lack thereof) between average tank temperature and the weight at 30 days**

Based on our results above, do we think the tank being warm or cold has much of an influence? Let's plot our data to confirm.

10. Make a scatter plot with `avg_tank_temp` on the x-axis (horizontal) and `day_30_weight_kg` on the y-axis (vertical). Change the axis labels to be more easily understood. (1 point)

    (Hint: because we want to plot *all* of the values, not just the mean values, we need to use the original `growth` data frame)

```{r}
plot(x = growth$avg_tank_temp, y = growth$day_30_weight,
     xlab = "Average Tank Temperature", ylab = "Day 30 Weight (kg)")
```

As we suspected, there does not seem to be a relationship between tank temperature and weight.

## Reflection

11. Write 3-5 sentences about if and how your predictions and execution differed and what you learned through the process. (4 points)

    *Examples of questions to answer: How did your initial prediction of how you expected to accomplish the 4 tasks match up with how we actually went about doing it? Were they similar? Were there common mistakes that you made beforehand? Did you plan a different execution from what we did above that you think would also work? What did you learn from this process?*

My initial predictions for these tasks were very similar to my results. However, there were some things I changed and some things I missed in my predictions. For example, in task B, I did not write about changing the labels for my x and y-axis so that the plot is easier to read. Also, in task C, I ended up using the if else statement within a mutate function instead of a for loop like a predicted I would. Both of these ways would have worked, but using a mutate function seemed like the easiest method at the time. In this assignment, I learned that there are many ways to code things and achieve the same result, and I just have to try to be as efficient as possible while problem solving in the way that makes the most sense to me.