Skip to content

Commit adb3826

Browse files
committed
commit
1 parent 1beed0d commit adb3826

File tree

2 files changed

+220
-1
lines changed

2 files changed

+220
-1
lines changed

PA1_template.Rmd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
---
2-
output: html_document
2+
output:
3+
md_document:
4+
variant: markdown_github
35
---
46
# Reproducible Research: Peer Assessment 1
57

PA1_template.md

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
Reproducible Research: Peer Assessment 1
2+
========================================
3+
4+
### 1. Load the data from course website; (i.e. read.csv())
5+
6+
activity<- read.csv("activity.csv")
7+
8+
### 2. Process/transform the data (if necessary) into a format suitable for your analysis
9+
10+
activity$date <- as.Date(activity$date, format = "%Y-%m-%d")
11+
activity$interval <- as.factor(activity$interval)
12+
13+
What is mean total number of steps taken per day?
14+
-------------------------------------------------
15+
16+
### 1.Calculate the total number of steps taken per day.
17+
18+
steps_daily <- aggregate(steps ~ date, activity, sum)
19+
colnames(steps_daily) <- c("date","steps")
20+
head(steps_daily)
21+
22+
## date steps
23+
## 1 2012-10-02 126
24+
## 2 2012-10-03 11352
25+
## 3 2012-10-04 12116
26+
## 4 2012-10-05 13294
27+
## 5 2012-10-06 15420
28+
## 6 2012-10-07 11015
29+
30+
### 2. Make a histogram of the total number of steps taken each day.
31+
32+
library(ggplot2)
33+
34+
## Warning: package 'ggplot2' was built under R version 3.2.5
35+
36+
ggplot(steps_daily, aes(x = steps)) +
37+
geom_histogram(fill = "green", binwidth = 1500) +
38+
labs(title="Summary of Steps Taken per Day (Oct-Nov 2012)",
39+
x = "Total Number of Steps per Day", y = "Daily Frequency (intervals)" ) +theme_bw()
40+
41+
![](PA1_template_files/figure-markdown_strict/unnamed-chunk-4-1.png)
42+
43+
### 3. Calculate and report the mean and median of the total number of steps taken per day
44+
45+
mean_steps_daily <- mean(steps_daily$steps, na.rm=TRUE)
46+
median_steps_daily <- median(steps_daily$steps, na.rm=TRUE)
47+
48+
Print:
49+
50+
mean_steps_daily
51+
52+
## [1] 10766.19
53+
54+
median_steps_daily
55+
56+
## [1] 10765
57+
58+
The average (mean) steps taken per day is 10766.19 while the median is
59+
10765.
60+
61+
What is the average daily activity pattern?
62+
-------------------------------------------
63+
64+
Prior to plotting, convert the steps taken by the 5-minute intervals
65+
into integers
66+
67+
interval_steps <- aggregate(activity$steps,
68+
by = list(interval = activity$interval),
69+
FUN=mean, na.rm=TRUE)
70+
71+
interval_steps$interval <- as.integer(levels(interval_steps$interval)[interval_steps$interval])
72+
as.integer(levels(interval_steps$interval)[interval_steps$interval])
73+
74+
## integer(0)
75+
76+
colnames(interval_steps) <- c("interval", "steps")
77+
78+
### 1. Make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
79+
80+
ggplot(interval_steps, aes(x=interval, y=steps)) +
81+
geom_line(color="green", size=1.5) +
82+
labs(title="Average Daily Activity Pattern (Oct-Nov 2012)", x="Interval (5-Minutes)", y="Average Number of steps") +
83+
theme_bw()
84+
85+
![](PA1_template_files/figure-markdown_strict/unnamed-chunk-9-1.png)
86+
87+
### 2. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
88+
89+
max_interval_steps <- interval_steps[which.max(interval_steps$steps),]
90+
print(max_interval_steps)
91+
92+
## interval steps
93+
## 104 835 206.1698
94+
95+
The 835th interval has the most steps taken at 206 steps.
96+
97+
Imputing Missing Values
98+
-----------------------
99+
100+
To address possible bias caused by missing values (NA), identify the
101+
number of missing values and device a strategy to fill out these values.
102+
Create a new (modified) dataset that integrates the new values.
103+
104+
### 1. Total number of missing values:
105+
106+
Calculate and report the total number of missing values in the dataset
107+
(i.e. the total number of rows with NAs)
108+
109+
missing_steps <- sum(is.na(activity$steps))
110+
print(missing_steps)
111+
112+
## [1] 2304
113+
114+
There are a total of 2304 missing values.
115+
116+
### 2. Fill the missing values and create a new dataset with the proxy values
117+
118+
Devise a strategy for filling in all of the missing values in the
119+
dataset. The strategy does not need to be sophisticated. For example,
120+
you could use the mean/median for that day, or the mean for that
121+
5-minute interval, etc.
122+
123+
na_steps_proxy <- function(data, pervalue) {
124+
na_list <- which(is.na(data$steps))
125+
na_proxy <- unlist(lapply(na_list, FUN=function(idx){
126+
interval = data[idx,]$interval
127+
pervalue[pervalue$interval == interval,]$steps
128+
}))
129+
proxy_na_steps <- data$steps
130+
proxy_na_steps[na_list] <- na_proxy
131+
proxy_na_steps
132+
}
133+
134+
activity_proxy <- data.frame(
135+
steps = na_steps_proxy(activity, interval_steps),
136+
date = activity$date,
137+
interval = activity$interval)
138+
139+
Check if there still missing values in the new dataset: activity\_proxy.
140+
141+
missing_steps_new <- sum(is.na(activity_proxy$steps))
142+
print(missing_steps_new)
143+
144+
## [1] 0
145+
146+
str(activity_proxy)
147+
148+
## 'data.frame': 17568 obs. of 3 variables:
149+
## $ steps : num 1.717 0.3396 0.1321 0.1509 0.0755 ...
150+
## $ date : Date, format: "2012-10-01" "2012-10-01" ...
151+
## $ interval: Factor w/ 288 levels "0","5","10","15",..: 1 2 3 4 5 6 7 8 9 10 ...
152+
153+
### 3. Plot a histogram of the total number of steps taken each day
154+
155+
Plot a historgram of the total number of steps taken per day with the
156+
proxy values
157+
158+
steps_daily_proxy<- aggregate(steps ~ date, activity_proxy, sum)
159+
colnames(steps_daily_proxy) <- c("date","steps")
160+
161+
ggplot(steps_daily_proxy, aes(x = steps)) +
162+
geom_histogram(fill = "green", binwidth = 1500) +
163+
labs(title="Summary of Steps Taken per Day (Oct-Nov 2012)",
164+
x = "Total Number of Steps per Day", y = "Daily Frequency (intervals)") + theme_bw()
165+
166+
![](PA1_template_files/figure-markdown_strict/unnamed-chunk-14-1.png)
167+
168+
Calculate the new mean and median of the data set with proxy values
169+
170+
mean_steps_daily_proxy <- mean(steps_daily_proxy$steps, na.rm=TRUE)
171+
median_steps_daily_proxy <- median(steps_daily_proxy$steps, na.rm=TRUE)
172+
print(mean_steps_daily_proxy)
173+
174+
## [1] 10766.19
175+
176+
print (median_steps_daily_proxy)
177+
178+
## [1] 10766.19
179+
180+
Note that the new mean and median are now equal at 10766.189; compared
181+
to the previous dataset with missing values. However, it is important to
182+
note that the value of the mean has not changed thus replacing the
183+
missing values has no impact on the estimates of computing the total
184+
daily number of steps.
185+
186+
Are there differences in activity patterns between weekdays and weekends?
187+
-------------------------------------------------------------------------
188+
189+
### 1.) Using the new dataset, add a new column indicating what type of day the observations were taken (Weekday or weekend)
190+
191+
activity_proxy$dateType <- ifelse(as.POSIXlt(activity_proxy$date)$wday %in% c(0,6), 'weekend', 'weekday')
192+
193+
head(activity_proxy)
194+
195+
## steps date interval dateType
196+
## 1 1.7169811 2012-10-01 0 weekday
197+
## 2 0.3396226 2012-10-01 5 weekday
198+
## 3 0.1320755 2012-10-01 10 weekday
199+
## 4 0.1509434 2012-10-01 15 weekday
200+
## 5 0.0754717 2012-10-01 20 weekday
201+
## 6 2.0943396 2012-10-01 25 weekday
202+
203+
### 2.) Plot a graph that would show if there is any difference in the activity by type of day.
204+
205+
activitybydaytype<- aggregate(steps ~ interval + dateType, activity_proxy, mean)
206+
207+
library(lattice)
208+
xyplot(steps ~ interval|dateType, data = activitybydaytype,
209+
type = "l", layout = c(1,2),
210+
grid = TRUE,
211+
xlab="5-Minute Time Interval", ylab = "Average Number of Steps Taken",
212+
main= "Average Steps Taken in Weekdays vs Weekends")
213+
214+
![](PA1_template_files/figure-markdown_strict/unnamed-chunk-17-1.png)
215+
216+
The graph shows that activity during weekdays has the highest peak but
217+
weekends have more frequent peaks.

0 commit comments

Comments
 (0)