Skip to content

Commit 8dfb21c

Browse files
author
e-lo
committed
Working rdpeng#1 and rdpeng#2…still working on rdpeng#3 and 4
1 parent dc20c7c commit 8dfb21c

File tree

4 files changed

+18050
-1
lines changed

4 files changed

+18050
-1
lines changed

PA1_template.Rmd

Lines changed: 118 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,137 @@
11
# Reproducible Research: Peer Assessment 1
2-
2+
Author: Elizabeth Sall
3+
Date: 2014-06-15
34

45
## Loading and preprocessing the data
6+
```{r}
7+
8+
data <- read.csv(unzip("activity.zip"))
59
10+
data[0:5,]
611
12+
summary(data)
13+
14+
```
715

816
## What is mean total number of steps taken per day?
917

18+
**Instructions**
19+
20+
Ignore missing values in dataset
21+
1. Make a histogram of the total number of steps taken each day
22+
1. Calculate and report the mean and median total number of steps taken per day
23+
24+
---
25+
26+
### Histogram of Steps per Day
27+
```{r fig.width=7, fig.height=6}
28+
29+
#ignore NA values for steps
30+
naSteps <- is.na(data$steps)
31+
32+
stepsPerDay <- tapply(data$steps[!naSteps], data$date[!naSteps], sum)
1033
34+
# based on range from 0 to 25,000ish, use breaks of every 1000 steps
1135
36+
hist(stepsPerDay,
37+
main="Frequency of Steps per Day",
38+
breaks = seq(0, 25000, 1000),
39+
col = "green",
40+
xlab = "Steps per Day",
41+
ylab = "Frequency (n)",
42+
)
43+
```
44+
### Mean & median steps per day
45+
```{r}
46+
mean(stepsPerDay,na.rm = TRUE)
47+
median(stepsPerDay, na.rm=TRUE)
48+
```
1249
## What is the average daily activity pattern?
1350

51+
**Instructions**
1452

53+
1. Make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
54+
1. Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
55+
56+
---
57+
58+
### Time series of average steps per 5 minute interval
59+
```{r}
60+
# stepsPerInterval <- tapply(data$steps[!naSteps], data$interval[!naSteps], mean)
61+
# aggdatabyInterval <- aggregate(dt$steps, by=list(Category=dt$interval), FUN=mean, na.rm=TRUE)
62+
63+
stepsPerInterval <- aggregate(data$steps, by=list(Category=data$interval) FUN=mean, na.rm=TRUE)
64+
65+
# create as timeseries
66+
steps <- ts(stepsPerInterval, start=c(0), end=c(2355), deltat=5)
67+
ts.plot(steps,
68+
main="Average Steps per 5 Minute Interval from Midnight"
69+
ylab="Mean Steps per 5-Min Time Interval",
70+
xlab="5 minute interval",
71+
)
72+
73+
```
74+
### Most Active 5-Minute Interval
75+
```{r}
76+
which.max(stepsPerInterval)
77+
```
1578

1679
## Imputing missing values
1780

81+
**Instructions**
82+
Note that there are a number of days/intervals where there are missing values (coded as NA). The presence of missing days may introduce bias into some calculations or summaries of the data.
83+
84+
1. Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with NAs)
85+
1. Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc.
86+
1. Create a new dataset that is equal to the original dataset but with the missing data filled in.
87+
1. Make a histogram of the total number of steps taken each day and Calculate and report the mean and median total number of steps taken per day. Do these values differ from the estimates from the first part of the assignment? What is the impact of imputing missing data on the estimates of the total daily number of steps?
88+
89+
### Calculate # of Missing Values
90+
```{r}
91+
### Number of Missing Values
92+
length(naSteps[naSteps==TRUE])
1893
94+
### Of Total Observations
95+
length(naSteps)
96+
```
97+
98+
### Devise a strategy for filling in all of the missing values in the dataset.
99+
```{r}
100+
101+
# add another column to datafram with imputed steps
102+
103+
isteps <- data.frame(numeric(length(data[,1])))
104+
colnames(isteps)<-c("isteps")
105+
datai <- cbind(data,isteps)
106+
datai[0:5,]
107+
108+
for (i in isteps) {
109+
if (is.na(i["steps"])){ #this not working, find out why
110+
i["isteps"]<-2000 #replace with better estimate
111+
}
112+
else {
113+
i["isteps"]<-i["steps"]
114+
}
115+
}
116+
datai[0:5,]
117+
```
19118

20119
## Are there differences in activity patterns between weekdays and weekends?
120+
121+
**Instructions**
122+
For this part the weekdays() function may be of some help here. Use the dataset with the filled-in missing values for this part.
123+
124+
1. Create a new factor variable in the dataset with two levels – “weekday” and “weekend” indicating whether a given date is a weekday or weekend day.
125+
1. Make a panel plot containing a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).
126+
127+
### Create weekend weekday factor variable
128+
129+
```{r}
130+
131+
```
132+
133+
### Make panel plot showing difference between weekend and weekday
134+
135+
```{r}
136+
137+
```

PA1_template.html

Lines changed: 271 additions & 0 deletions
Large diffs are not rendered by default.

PA1_template.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Reproducible Research: Peer Assessment 1
2+
Author: Elizabeth Sall
3+
Date: 2014-06-15
4+
5+
## Loading and preprocessing the data
6+
7+
```r
8+
data <- read.csv(unzip("activity.zip"))
9+
10+
data[0:5,]
11+
```
12+
13+
```
14+
## steps date interval
15+
## 1 NA 2012-10-01 0
16+
## 2 NA 2012-10-01 5
17+
## 3 NA 2012-10-01 10
18+
## 4 NA 2012-10-01 15
19+
## 5 NA 2012-10-01 20
20+
```
21+
22+
```r
23+
summary(data)
24+
```
25+
26+
```
27+
## steps date interval
28+
## Min. : 0.0 2012-10-01: 288 Min. : 0
29+
## 1st Qu.: 0.0 2012-10-02: 288 1st Qu.: 589
30+
## Median : 0.0 2012-10-03: 288 Median :1178
31+
## Mean : 37.4 2012-10-04: 288 Mean :1178
32+
## 3rd Qu.: 12.0 2012-10-05: 288 3rd Qu.:1766
33+
## Max. :806.0 2012-10-06: 288 Max. :2355
34+
## NA's :2304 (Other) :15840
35+
```
36+
37+
## What is mean total number of steps taken per day?
38+
39+
**Instructions**
40+
41+
Ignore missing values in dataset
42+
1. Make a histogram of the total number of steps taken each day
43+
1. Calculate and report the mean and median total number of steps taken per day
44+
45+
---
46+
47+
### Histogram of Steps per Day
48+
49+
```r
50+
#ignore NA values for steps
51+
naSteps <- is.na(data$steps)
52+
53+
stepsPerDay <- tapply(data$steps[!naSteps], data$date[!naSteps], sum)
54+
55+
# based on range from 0 to 25,000ish, use breaks of every 1000 steps
56+
57+
hist(stepsPerDay,
58+
main="Frequency of Steps per Day",
59+
breaks = seq(0, 25000, 1000),
60+
col = "green",
61+
xlab = "Steps per Day",
62+
ylab = "Frequency (n)",
63+
)
64+
```
65+
66+
![plot of chunk unnamed-chunk-2](figure/unnamed-chunk-2.png)
67+
### Mean & median steps per day
68+
69+
```r
70+
mean(stepsPerDay,na.rm = TRUE)
71+
```
72+
73+
```
74+
## [1] 10766
75+
```
76+
77+
```r
78+
median(stepsPerDay, na.rm=TRUE)
79+
```
80+
81+
```
82+
## [1] 10765
83+
```
84+
## What is the average daily activity pattern?
85+
86+
87+
88+
## Imputing missing values
89+
90+
91+
92+
## Are there differences in activity patterns between weekdays and weekends?

0 commit comments

Comments
 (0)