Skip to content

Commit f6852e1

Browse files
committed
submission rdpeng#3
1 parent d3ad087 commit f6852e1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+10278
-2
lines changed

PA1_template.Rmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ knitr::opts_chunk$set(echo=TRUE)
1212

1313
```{r,echo=TRUE}
1414
library(data.table)
15-
setwd("./repdata-data-activity")
15+
#setwd("./repdata-data-activity")
1616
data <- read.csv("activity.csv",head=TRUE,sep=",")
1717
data$date <- as.Date(as.character(data$date, "%Y-%m-%d"))
1818
data <- data.table(data)
@@ -101,4 +101,4 @@ setnames(avgdata ,3,"AvgSteps")
101101
library(lattice)
102102
xyplot(AvgSteps ~interval | date, typ="l" , ylab="Number of Steps", data = avgdata, layout = c(1, 2))
103103
104-
```
104+
```

PA1_template.html

Lines changed: 315 additions & 0 deletions
Large diffs are not rendered by default.

PA1_template.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: "Reproducible Research: Peer Assessment 1"
3+
output:
4+
html_document:
5+
keep_md: true
6+
---
7+
8+
9+
```r
10+
knitr::opts_chunk$set(echo=TRUE)
11+
```
12+
##Loading and preprocessing the data
13+
14+
15+
```r
16+
library(data.table)
17+
#setwd("./repdata-data-activity")
18+
data <- read.csv("activity.csv",head=TRUE,sep=",")
19+
data$date <- as.Date(as.character(data$date, "%Y-%m-%d"))
20+
data <- data.table(data)
21+
```
22+
23+
##What is mean total number of steps taken per day?
24+
1. To find the total number of steps per day and plot the histogram
25+
Note: I removed the instances where there are 0 total steps taken in a given day. These are due to having source data with NA for all intervals for that given day which should be excluded from the analysis since they are not representative of the true behaviour
26+
27+
```r
28+
sumdata <- data[,sum(steps,na.rm=TRUE), by=date]
29+
setnames(sumdata,2, "TotalSteps")
30+
#sumdata <- sumdata[sumdata$TotalSteps>0,]
31+
hist(sumdata$TotalSteps, col="blue", xlab = "Total Steps Each Day",main="Total Number of Steps Taken Each Day")
32+
```
33+
34+
![plot of chunk unnamed-chunk-2](figure/unnamed-chunk-2.png)
35+
36+
2. Calculate and report the mean and median total number of steps taken per day
37+
NOTE: Pls see note above at the top on decision not to remove 0 steps from the calculation
38+
39+
```r
40+
meansteps <- mean(sumdata$TotalSteps,na.rm=TRUE)
41+
mediansteps <-median(sumdata$TotalSteps,na.rm=TRUE)
42+
```
43+
The mean total number of steps taken per day is 9354.2295
44+
The median total number of steps taken per day is **10395**
45+
46+
47+
## What is the average daily activity pattern?
48+
49+
```r
50+
avgstepsint <- data[,mean(steps,na.rm=TRUE),by=interval]
51+
setnames(avgstepsint,2, "AvgSteps")
52+
with(avgstepsint,plot(interval,AvgSteps,typ="l",xlab ="5-min Interval", ylab="Avg Number of Steps"))
53+
```
54+
55+
![plot of chunk unnamed-chunk-4](figure/unnamed-chunk-4.png)
56+
57+
```r
58+
maxstepsint<-avgstepsint[avgstepsint$AvgSteps == max(avgstepsint$AvgSteps),][[1]]
59+
```
60+
Interval **835**, on average across all days, contains the maximum number of steps.
61+
62+
##Imputing missing values
63+
1. Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with NAs)
64+
65+
```r
66+
NArows <- data[is.na(data$steps),]
67+
totalNArows <- nrow(NArows)
68+
```
69+
70+
The total number of missing values in the dataset is **2304**
71+
72+
2. Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc.
73+
3. Create a new dataset that is equal to the original dataset but with the missing data filled in.
74+
75+
```r
76+
meanintbydate <- data[,mean(steps,na.rm=TRUE),by=interval]
77+
meanintbydate <-data.frame(meanintbydate)
78+
NArows <-data.frame(NArows)
79+
80+
results <- merge(meanintbydate,NArows, all=TRUE)
81+
82+
results<-results[,-c(3)]
83+
setnames(results,2,"steps")
84+
results<-results[,c(2,3,1)]
85+
86+
#remove those rows with NA from original data set
87+
noNAdata <- data[!is.na(data$steps),]
88+
#combine the rows with the results from above
89+
final <- rbind(results,noNAdata)
90+
```
91+
92+
93+
```r
94+
sumdata <- final[,sum(steps,na.rm=TRUE), by=date]
95+
setnames(sumdata,2, "TotalSteps")
96+
hist(sumdata$TotalSteps, col="blue", xlab = "Total Steps Each Day",main="Total Number of Steps Taken Each Day")
97+
```
98+
99+
![plot of chunk unnamed-chunk-7](figure/unnamed-chunk-7.png)
100+
101+
```r
102+
meansteps <- mean(sumdata$TotalSteps,na.rm=TRUE)
103+
mediansteps <-median(sumdata$TotalSteps,na.rm=TRUE)
104+
```
105+
The mean total number of steps taken per day is **1.0766 &times; 10<sup>4</sup>**
106+
The median total number of steps taken per day is **1.0766 &times; 10<sup>4</sup>**
107+
## Are there differences in activity patterns between weekdays and weekends?
108+
109+
110+
111+
```r
112+
final$date <- weekdays(final$date)
113+
final[!grepl("Saturday|Sunday",final$date),]$date <- "weekday"
114+
final[!grepl("weekday",final$date),]$date <- "weekend"
115+
final$date <-factor(final$date)
116+
avgdata <- final[,mean(steps),by=c("date","interval")]
117+
setnames(avgdata ,3,"AvgSteps")
118+
119+
library(lattice)
120+
xyplot(AvgSteps ~interval | date, typ="l" , ylab="Number of Steps", data = avgdata, layout = c(1, 2))
121+
```
122+
123+
![plot of chunk unnamed-chunk-8](figure/unnamed-chunk-8.png)

0 commit comments

Comments
 (0)