Skip to content

ihcinihsdk/ActivityRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Document on Activity Recognition project

Goal

With a large amount of data about personal activity collected by devices, it is interesting not only to quantify how much a particular activity a person does, but also to quantify how well he does it. In this task, given a large amount of labeled data which describe accelerometers on the belt, forearm, arm and dumbell of six participants, and whether they are performing barbell lifts correctly or incorrectly (if incorrectly, what mistake they have made), we would like to train a system which automatically tells us for new data how a person is performing.

Data Partition and Model Build

In 2013, Velloso E. et. al found 17 factors in their work which are most critical in deciding how a participant performs in barbell lifts and what is a mistake in an incorrect performance. These 17 factors, however, do not always come from raw data (e.g. the range of accelerometer) and require manual manipulation.

The data in "pml-training" are partitioned based on the percentile of variable "classe", and 70% of the data are used for training the model while the other 30% are used for cross-validating, see Fig. 1 Instead of directly using only these 17 factors, I would like to try a more straightforward model: training a random forest model on all "meaningful" columns (by "meaningful", the columns like "user name" or "date and time" and columns with rare valid data are eliminated). Though the training process might take long (eventually it took about one hour to train the model with 70% of the input data, which is the size of my training set), it is intuitively the model to choose -- all mistakes on performance are related to some combinations of angles and velocities. If cross validation showed that the model did not perform well (probably due to overfitting if I could observe a big gap between the error on training and cross validating data), then I would try to look at the order of variable importance contributing to variable "classe" and remove several of those less important variables, and rebuild a random forest model. However, the straightforward random forest model seems to be a big pay-off: on training data, an accuracy of 100% was observed; on cross validation data, an accuracy of more than 99% was observed (see Fig. 2). (This model turned out to be a pay-off on the 20 testing data also, where an accuracy of 100% was observed.)

About

Project of the course Practice Machine Learning on Coursera

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages