|
| 1 | +--- |
| 2 | +title: "Reproducible Research: Peer Assessment 1" |
| 3 | +output: |
| 4 | + html_document: |
| 5 | + keep_md: true |
| 6 | +--- |
| 7 | + |
| 8 | + |
| 9 | +```r |
| 10 | +knitr::opts_chunk$set(echo=TRUE) |
| 11 | +``` |
| 12 | +##Loading and preprocessing the data |
| 13 | + |
| 14 | + |
| 15 | +```r |
| 16 | +library(data.table) |
| 17 | +#setwd("./repdata-data-activity") |
| 18 | +data <- read.csv("activity.csv",head=TRUE,sep=",") |
| 19 | +data$date <- as.Date(as.character(data$date, "%Y-%m-%d")) |
| 20 | +data <- data.table(data) |
| 21 | +``` |
| 22 | + |
| 23 | +##What is mean total number of steps taken per day? |
| 24 | +1. To find the total number of steps per day and plot the histogram |
| 25 | +Note: I removed the instances where there are 0 total steps taken in a given day. These are due to having source data with NA for all intervals for that given day which should be excluded from the analysis since they are not representative of the true behaviour |
| 26 | + |
| 27 | +```r |
| 28 | +sumdata <- data[,sum(steps,na.rm=TRUE), by=date] |
| 29 | +setnames(sumdata,2, "TotalSteps") |
| 30 | +#sumdata <- sumdata[sumdata$TotalSteps>0,] |
| 31 | +hist(sumdata$TotalSteps, col="blue", xlab = "Total Steps Each Day",main="Total Number of Steps Taken Each Day") |
| 32 | +``` |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | +2. Calculate and report the mean and median total number of steps taken per day |
| 37 | +NOTE: Pls see note above at the top on decision not to remove 0 steps from the calculation |
| 38 | + |
| 39 | +```r |
| 40 | +meansteps <- mean(sumdata$TotalSteps,na.rm=TRUE) |
| 41 | +mediansteps <-median(sumdata$TotalSteps,na.rm=TRUE) |
| 42 | +``` |
| 43 | +The mean total number of steps taken per day is 9354.2295 |
| 44 | +The median total number of steps taken per day is **10395** |
| 45 | + |
| 46 | + |
| 47 | +## What is the average daily activity pattern? |
| 48 | + |
| 49 | +```r |
| 50 | +avgstepsint <- data[,mean(steps,na.rm=TRUE),by=interval] |
| 51 | +setnames(avgstepsint,2, "AvgSteps") |
| 52 | +with(avgstepsint,plot(interval,AvgSteps,typ="l",xlab ="5-min Interval", ylab="Avg Number of Steps")) |
| 53 | +``` |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | +```r |
| 58 | +maxstepsint<-avgstepsint[avgstepsint$AvgSteps == max(avgstepsint$AvgSteps),][[1]] |
| 59 | +``` |
| 60 | +Interval **835**, on average across all days, contains the maximum number of steps. |
| 61 | + |
| 62 | +##Imputing missing values |
| 63 | +1. Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with NAs) |
| 64 | + |
| 65 | +```r |
| 66 | +NArows <- data[is.na(data$steps),] |
| 67 | +totalNArows <- nrow(NArows) |
| 68 | +``` |
| 69 | + |
| 70 | +The total number of missing values in the dataset is **2304** |
| 71 | + |
| 72 | +2. Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc. |
| 73 | +3. Create a new dataset that is equal to the original dataset but with the missing data filled in. |
| 74 | + |
| 75 | +```r |
| 76 | +meanintbydate <- data[,mean(steps,na.rm=TRUE),by=interval] |
| 77 | +meanintbydate <-data.frame(meanintbydate) |
| 78 | +NArows <-data.frame(NArows) |
| 79 | + |
| 80 | +results <- merge(meanintbydate,NArows, all=TRUE) |
| 81 | + |
| 82 | +results<-results[,-c(3)] |
| 83 | +setnames(results,2,"steps") |
| 84 | +results<-results[,c(2,3,1)] |
| 85 | + |
| 86 | +#remove those rows with NA from original data set |
| 87 | +noNAdata <- data[!is.na(data$steps),] |
| 88 | +#combine the rows with the results from above |
| 89 | +final <- rbind(results,noNAdata) |
| 90 | +``` |
| 91 | + |
| 92 | + |
| 93 | +```r |
| 94 | +sumdata <- final[,sum(steps,na.rm=TRUE), by=date] |
| 95 | +setnames(sumdata,2, "TotalSteps") |
| 96 | +hist(sumdata$TotalSteps, col="blue", xlab = "Total Steps Each Day",main="Total Number of Steps Taken Each Day") |
| 97 | +``` |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +```r |
| 102 | +meansteps <- mean(sumdata$TotalSteps,na.rm=TRUE) |
| 103 | +mediansteps <-median(sumdata$TotalSteps,na.rm=TRUE) |
| 104 | +``` |
| 105 | +The mean total number of steps taken per day is **1.0766 × 10<sup>4</sup>** |
| 106 | +The median total number of steps taken per day is **1.0766 × 10<sup>4</sup>** |
| 107 | +## Are there differences in activity patterns between weekdays and weekends? |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | +```r |
| 112 | +final$date <- weekdays(final$date) |
| 113 | +final[!grepl("Saturday|Sunday",final$date),]$date <- "weekday" |
| 114 | +final[!grepl("weekday",final$date),]$date <- "weekend" |
| 115 | +final$date <-factor(final$date) |
| 116 | +avgdata <- final[,mean(steps),by=c("date","interval")] |
| 117 | +setnames(avgdata ,3,"AvgSteps") |
| 118 | + |
| 119 | +library(lattice) |
| 120 | +xyplot(AvgSteps ~interval | date, typ="l" , ylab="Number of Steps", data = avgdata, layout = c(1, 2)) |
| 121 | +``` |
| 122 | + |
| 123 | + |
0 commit comments