Skip to content

Commit 1579220

Browse files
committed
first commit
draft, still working on problem rdpeng#3
1 parent 80edf39 commit 1579220

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+10173
-1
lines changed

PA1_template.Rmd

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,85 @@ output:
77

88

99
## Loading and preprocessing the data
10+
```{r, echo=TRUE}
11+
#library(dplyr)
12+
library(lattice)
13+
activity<-read.csv("activity.csv")
14+
activity$date<-as.Date(activity$date)
15+
clean_activity<-activity[!is.na(activity$steps),]
1016
17+
```
1118

1219

1320
## What is mean total number of steps taken per day?
21+
```{r, echo=TRUE}
1422
15-
23+
part1<-aggregate(steps ~date, clean_activity, sum)
24+
hist(part1$steps,
25+
main= "Total number of steps per day",
26+
xlab="Steps",
27+
breaks=15,
28+
col="blue")
29+
mean(part1$steps)
30+
median(part1$steps)
31+
```
1632

1733
## What is the average daily activity pattern?
1834

35+
```{r, echo=TRUE}
36+
interval_steps<-aggregate(steps ~ interval, clean_activity, mean)
37+
38+
plot(interval_steps$interval,
39+
interval_steps$steps,
40+
type="l",
41+
xlab="Time interval",
42+
ylab="Number of steps")
43+
44+
# Use which.max() to find largest mean number of steps per day,
45+
# then return the associated interval
1946
47+
interval_steps[which.max(interval_steps$steps), 1]
48+
49+
50+
```
51+
So the interval with the hightest mean number of steps is 835
2052

2153
## Imputing missing values
54+
```{r}
55+
# count the number of NA values
56+
nrow(activity[is.na(activity$steps),])
57+
```
58+
1. The total number of missing values in the dataset is 2304
59+
60+
2. The strategy is to replace "NA" values with the average steps taken for
61+
that time interval across all days
62+
63+
```{r, echo=TRUE}
2264
65+
fixed_activity<-activity
2366
67+
# strategy: fill NA with average for that time interval
68+
69+
70+
```
2471

2572
## Are there differences in activity patterns between weekdays and weekends?
73+
```{r, echo=TRUE}
74+
75+
# use weekdays() function to label each day
76+
clean_activity$weekend <- ifelse(weekdays(clean_activity$date) %in%
77+
c("Saturday", "Sunday"),
78+
"weekend", "weekday")
79+
clean_activity$weekend<-as.factor(clean_activity$weekend)
80+
81+
# same strategy as before but add the weekend factor variable
82+
avg_steps_w<-aggregate(steps ~ interval + weekend, clean_activity, mean)
83+
84+
#lattice plot separated by weekend and weekday
85+
xyplot(avg_steps_w$steps ~ avg_steps_w$interval | avg_steps_w$weekend,
86+
type="l",
87+
layout=c(1,2),
88+
main="Comparison between activity on weekdays and weekends",
89+
xlab="Time intervals",
90+
ylab="Average steps")
91+
```

PA1_template.html

Lines changed: 154 additions & 0 deletions
Large diffs are not rendered by default.

PA1_template.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Reproducible Research: Peer Assessment 1
2+
3+
4+
## Loading and preprocessing the data
5+
6+
```r
7+
#library(dplyr)
8+
library(lattice)
9+
activity<-read.csv("activity.csv")
10+
activity$date<-as.Date(activity$date)
11+
clean_activity<-activity[!is.na(activity$steps),]
12+
```
13+
14+
15+
## What is mean total number of steps taken per day?
16+
17+
```r
18+
part1<-aggregate(steps ~date, clean_activity, sum)
19+
hist(part1$steps,
20+
main= "Total number of steps per day",
21+
xlab="Steps",
22+
breaks=15,
23+
col="blue")
24+
```
25+
26+
![](./PA1_template_files/figure-html/unnamed-chunk-2-1.png)
27+
28+
```r
29+
mean(part1$steps)
30+
```
31+
32+
```
33+
## [1] 10766.19
34+
```
35+
36+
```r
37+
median(part1$steps)
38+
```
39+
40+
```
41+
## [1] 10765
42+
```
43+
44+
## What is the average daily activity pattern?
45+
46+
47+
```r
48+
interval_steps<-aggregate(steps ~ interval, clean_activity, mean)
49+
50+
plot(interval_steps$interval,
51+
interval_steps$steps,
52+
type="l",
53+
xlab="Time interval",
54+
ylab="Number of steps")
55+
```
56+
57+
![](./PA1_template_files/figure-html/unnamed-chunk-3-1.png)
58+
59+
```r
60+
# Use which.max() to find largest mean number of steps per day,
61+
# then return the associated interval
62+
63+
interval_steps[which.max(interval_steps$steps), 1]
64+
```
65+
66+
```
67+
## [1] 835
68+
```
69+
So the interval with the hightest mean number of steps is 835
70+
71+
## Imputing missing values
72+
73+
```r
74+
# count the number of NA values
75+
nrow(activity[is.na(activity$steps),])
76+
```
77+
78+
```
79+
## [1] 2304
80+
```
81+
1. The total number of missing values in the dataset is 2304
82+
83+
2. The strategy is to replace "NA" values with the average steps taken for
84+
that time interval across all days
85+
86+
87+
```r
88+
fixed_activity<-activity
89+
90+
# strategy: fill NA with average for that time interval
91+
```
92+
93+
## Are there differences in activity patterns between weekdays and weekends?
94+
95+
```r
96+
# use weekdays() function to label each day
97+
clean_activity$weekend <- ifelse(weekdays(clean_activity$date) %in%
98+
c("Saturday", "Sunday"),
99+
"weekend", "weekday")
100+
clean_activity$weekend<-as.factor(clean_activity$weekend)
101+
102+
# same strategy as before but add the weekend factor variable
103+
avg_steps_w<-aggregate(steps ~ interval + weekend, clean_activity, mean)
104+
105+
#lattice plot separated by weekend and weekday
106+
xyplot(avg_steps_w$steps ~ avg_steps_w$interval | avg_steps_w$weekend,
107+
type="l",
108+
layout=c(1,2),
109+
main="Comparison between activity on weekdays and weekends",
110+
xlab="Time intervals",
111+
ylab="Average steps")
112+
```
113+
114+
![](./PA1_template_files/figure-html/unnamed-chunk-6-1.png)

0 commit comments

Comments
 (0)