Lily_McMullen_Module3_Assignment2.Rmd

---
title: "Module 3 Assignment 2"
author: "Ellen Bledsoe" 'Lily McMullen'
date: '2022-10-27'
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Assignment Details

### Purpose

The goal of this assignment is to assess your ability to compare means numerically, visually, and statistically.

### Task

Write R code which produces the correct answers and correctly interpret the results of visualizations and statistical tests.

### Criteria for Success

-   Code is within the provided code chunks
-   Code is commented with brief descriptions of what the code does
-   Code chunks run without errors
-   Code produces the correct result
    -   Code that produces the correct answer will receive full credit
    -   Code attempts with logical direction will receive partial credit
-   Written answers address the questions in sufficient detail

### Due Date

November 8 at midnight MST

# Assignment Questions

In this assignment, we're going to explore another data set on wind turbines that generate a significant portion of the energy for us down here in Antarctica.

### Set-Up

Let's load the `tidyverse` and read in the data set. Call the data `turbines`.

```{r}
library(tidyverse)
turbines <- read_csv("wind_turbines.csv")
```

1.  Explore the data set, either through the environment or through code. Answer the following questions (2 point):

    a.  How many turbine makers are there?

    There are two turbine makers.

    a.  What does each row of data represent

    Each row of data represents a different turbine.

    ```{r}
    head(turbines)
    # optional; only if you want space for coding
    ```

### Numeric

2.  Generate a summary of the data set that calculates the mean wind speed and mean power output for each wind turbine company. (2 point)

```{r}
turbine_summary <- turbines %>%
  group_by(manufacturer) %>%
  summarize(wind_speed_mean = mean(wind_speed),
            power_output_mean = mean(power_output))
head(turbine_summary)
```

### Visual

3.  Create a density plot for the power output variable. (3 points)
    -   be sure to have a density plot for each turbine producer; the color and the fill should be determined by the maker of the turbine
    -   add in vertical lines for the mean values in the same color as the turbine makers
    -   make sure the x-axis, y-axis, and legend labels are capitalized and easier to understand (power output in measured in kilowatts, or kWh)
    -   use the `theme_classic()` function

```{r}
ggplot(turbines, aes(power_output, color = manufacturer, fill = manufacturer)) +
  geom_density(alpha = 0.5) +
  labs(x="Power Output (kWh)", 
       y="Density",
       color = "Manufacturer",
       fill="Manufacturer") +
  theme(legend.position = "top") +
  geom_vline(data = turbine_summary, aes(xintercept=power_output_mean, color = manufacturer),
             linetype="dashed") +
  theme_classic()
```

4.  Generate a box-and-whisker plot using `ggplot2` that compares the wind speed between different turbine makers (3 points).

    The plot should:

    -   have capitalized and more descriptive axis labels (hint: wind speed is measured in kilometers per hour---km/hr)
    -   show raw data points in addition to the boxes. The points should be jittered.
    -   use the `theme_classic()` function

```{r}
ggplot(turbines, aes(manufacturer, wind_speed)) +
  geom_boxplot() +
  labs(x="Manufacturer", 
       y="Wind Speed (km/hr)") +
  geom_jitter(alpha = 0.5, width = 0.1) +
  theme_classic()
```

### Statistic

5.  Write a null hypothesis and an alternative hypothesis for the question we are asking and that we will be using statistics to answer. (2 points)

    **Null Hypothesis** (H~0~): The means for power output and wind speed of the turbines from both Turbo Turbines and Windmill Inc will be equal.\
    **Alternative Hypothesis** (H~A~): The means for power output and wind speed of the turbines from Turbo Turbines will be higher than the means from Windmill Inc.

6.  Based on the mean values in the `turbine_summary` data frame and the plots you've created above, predict the outcome of each t-test (graded for completion, not accuracy). Explain your reasoning (1-2 sentences for each t-test is fine). (2 points)

    *Answer:*

    *I predict that the p-values for power output by turbine maker will be above 0.05, meaning there is a meaningful difference between makers for power output values. This is because there is a clear difference in the mean v-line on our data visualization.*

    I predict that the p-values for wind speed by turbine maker will be below 0.05, meaning there is no meaningful difference between makers for wind speed values. This is because the means look very similar on our data visualization.

7.  Perform a t-test on the power output by turbine maker. (1 point)

```{r}
t.test(data = turbines, power_output ~ manufacturer)
```

8.  In 2-3 sentences, interpret the output from question 7. Focus on what the p-value is in reference to the cutoff of 0.05, what that means, and whether that means we accept or reject the null hypothesis. (2 points)

    *Answer: The p-value for this test (power output vs manufacturer) is less than 0.05. This means there is a meaningful difference in power output per manufacturer and we can reject our null hypothesis.*

9.  Perform another t-test, this time on the wind_speed variable by manufacturer. (1 point)

```{r}
t.test(data = turbines, wind_speed ~ manufacturer)
```

10. In 2-3 sentences, interpret the output from question 9 (focus on the same ideas as question 8). (2 points)

    *Answer: Our p-value is above 0.05 for wind speed vs manufacturer. This means that there is no meaningful difference between wind speed between the different manufacturers. We can reject our alternate hypothesis.*