Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RepData_PeerAssesment1-Santos #503

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 90 additions & 5 deletions PA1_template.Rmd
Original file line number Diff line number Diff line change
@@ -1,25 +1,110 @@
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
html_document:
keep_md: true
---
## Loading and preprocessing the data
In this data section I read the data from the shared folder.

```{r, echo=FALSE}
library(tidyverse) #Need for the Pipes I will use in next step.
library(knitr)
options(digits=0) #Shows full numbers
options(scipen=999) #Removes scientific notation
```

## Loading and preprocessing the data
```{r, echo=TRUE}
activity <- read.csv("~/Reproducible Research/week2/activity.csv")
```

### Histogram of number of steps
Here, I do an histogram of the number of steps taken each day.

```{r, echo=TRUE}
data<- activity %>%
group_by(date) %>%
summarize(Steps_per_day=sum(steps))

hist(data$Steps_per_day)
```

# What is the mean and median number of steps taken each day?
As instructed, I use NA.RM to ignore the missings.
```{r ,echo=TRUE}
mean<-mean(data$Steps_per_day,na.rm=TRUE)
median<-median(data$Steps_per_day,na.rm=TRUE)
```

The mean number of steps is `r mean` and the median is `r median`.

# What is the average daily activity pattern?
To study this, I will make a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis). For this, I aggregate the data per interval.

```{r , echo=TRUE}
data2<- activity %>%
group_by(interval) %>%
summarize(Steps_per_interval=mean(steps,na.rm=TRUE))

plot(data2$interval,data2$Steps_per_interval,type="l")

## What is mean total number of steps taken per day?
max<-max(data2$Steps_per_interval)

max2<-subset(data2,(data2$Steps_per_interval)==max)
```

Interval `r max2$interval` is the one with the maximum or higher number of steps. The total number of steps is `r max2$Steps_per_interval`.

## What is the average daily activity pattern?
#Imputing missing values
```{r determine missing, echo=TRUE}
activity$missing<-ifelse(is.na(activity$steps)==TRUE,1,0)

data3<-activity %>%
summarize(Missing=sum(missing))

total_missing<-data3$Missing

## Imputing missing values
rm(data3)
```

The total number of observation with a missing value is `r total_missing`.

```{r , echo=TRUE}
#Here I create a new dataset to avoid overwriting the old one
activity2<-activity

mean_steps<-mean(activity2$steps,na.rm=TRUE)

activity2$steps_imp<-ifelse(is.na(activity2$steps)==FALSE,activity2$steps,mean_steps)
```

Now, I create the new dataset per day, but using the imputed data.

```{r, echo=TRUE}
data_imp<- activity2 %>%
group_by(date) %>%
summarize(Steps_per_day_imp=sum(steps_imp))

hist(data_imp$Steps_per_day_imp)
```

## Are there differences in activity patterns between weekdays and weekends?
```{r, echo=TRUE}
activity3<-activity2
activity3$date_new <- as.Date(activity3$date)
activity3$day<-weekdays(activity3$date_new)

activity3$weekend<-ifelse(activity3$day=="Saturday","Weekend",
ifelse(activity3$day=="Sunday","Weekend","Weekday"))
```

```{r, echo=TRUE}
library(lattice)
weekday_series<- activity3 %>%
group_by(weekend,interval) %>%
summarize(Steps_per_int_imp=mean(steps_imp))

xyplot(Steps_per_int_imp ~ interval | weekend,
data = weekday_series,
type = "l")
```
501 changes: 501 additions & 0 deletions PA1_template.html

Large diffs are not rendered by default.

164 changes: 164 additions & 0 deletions PA1_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
title: "Reproducible Research: Peer Assessment 1"
output:
html_document:
keep_md: true
---
## Loading and preprocessing the data
In this data section I read the data from the shared folder.


```
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
```

```
## ✓ ggplot2 3.3.1 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
```

```
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
```


```r
activity <- read.csv("~/Reproducible Research/week2/activity.csv")
```

### Histogram of number of steps
Here, I do an histogram of the number of steps taken each day.


```r
data<- activity %>%
group_by(date) %>%
summarize(Steps_per_day=sum(steps))
```

```
## `summarise()` ungrouping output (override with `.groups` argument)
```

```r
hist(data$Steps_per_day)
```

![](PA1_template_files/figure-html/unnamed-chunk-3-1.png)<!-- -->

# What is the mean and median number of steps taken each day?
As instructed, I use NA.RM to ignore the missings.

```r
mean<-mean(data$Steps_per_day,na.rm=TRUE)
median<-median(data$Steps_per_day,na.rm=TRUE)
```

The mean number of steps is 10766 and the median is 10765.

# What is the average daily activity pattern?
To study this, I will make a time series plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis). For this, I aggregate the data per interval.


```r
data2<- activity %>%
group_by(interval) %>%
summarize(Steps_per_interval=mean(steps,na.rm=TRUE))
```

```
## `summarise()` ungrouping output (override with `.groups` argument)
```

```r
plot(data2$interval,data2$Steps_per_interval,type="l")
```

![](PA1_template_files/figure-html/unnamed-chunk-5-1.png)<!-- -->

```r
max<-max(data2$Steps_per_interval)

max2<-subset(data2,(data2$Steps_per_interval)==max)
```

Interval 835 is the one with the maximum or higher number of steps. The total number of steps is 206.

#Imputing missing values

```r
activity$missing<-ifelse(is.na(activity$steps)==TRUE,1,0)

data3<-activity %>%
summarize(Missing=sum(missing))

total_missing<-data3$Missing

rm(data3)
```

The total number of observation with a missing value is 2304.


```r
#Here I create a new dataset to avoid overwriting the old one
activity2<-activity

mean_steps<-mean(activity2$steps,na.rm=TRUE)

activity2$steps_imp<-ifelse(is.na(activity2$steps)==FALSE,activity2$steps,mean_steps)
```

Now, I create the new dataset per day, but using the imputed data.


```r
data_imp<- activity2 %>%
group_by(date) %>%
summarize(Steps_per_day_imp=sum(steps_imp))
```

```
## `summarise()` ungrouping output (override with `.groups` argument)
```

```r
hist(data_imp$Steps_per_day_imp)
```

![](PA1_template_files/figure-html/unnamed-chunk-7-1.png)<!-- -->

## Are there differences in activity patterns between weekdays and weekends?

```r
activity3<-activity2
activity3$date_new <- as.Date(activity3$date)
activity3$day<-weekdays(activity3$date_new)

activity3$weekend<-ifelse(activity3$day=="Saturday","Weekend",
ifelse(activity3$day=="Sunday","Weekend","Weekday"))
```


```r
library(lattice)
weekday_series<- activity3 %>%
group_by(weekend,interval) %>%
summarize(Steps_per_int_imp=mean(steps_imp))
```

```
## `summarise()` regrouping output by 'weekend' (override with `.groups` argument)
```

```r
xyplot(Steps_per_int_imp ~ interval | weekend,
data = weekday_series,
type = "l")
```

![](PA1_template_files/figure-html/unnamed-chunk-9-1.png)<!-- -->
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figure-html/steps per day with imputed data-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figure-html/steps per day-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figure-html/unnamed-chunk-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figure-html/unnamed-chunk-6-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.