Document on Activity Recognition project

Goal

With a large amount of data about personal activity collected by devices, it is interesting not only to quantify how much a particular activity a person does, but also to quantify how well he does it. In this task, given a large amount of labeled data which describe accelerometers on the belt, forearm, arm and dumbell of six participants, and whether they are performing barbell lifts correctly or incorrectly (if incorrectly, what mistake they have made), we would like to train a system which automatically tells us for new data how a person is performing.

Data Partition and Model Build

In 2013, Velloso E. et. al found 17 factors in their work which are most critical in deciding how a participant performs in barbell lifts and what is a mistake in an incorrect performance. These 17 factors, however, do not always come from raw data (e.g. the range of accelerometer) and require manual manipulation.

The data in "pml-training" are partitioned based on the percentile of variable "classe", and 70% of the data are used for training the model while the other 30% are used for cross-validating, see Fig. 1 Instead of directly using only these 17 factors, I would like to try a more straightforward model: training a random forest model on all "meaningful" columns (by "meaningful", the columns like "user name" or "date and time" and columns with rare valid data are eliminated). Though the training process might take long (eventually it took about one hour to train the model with 70% of the input data, which is the size of my training set), it is intuitively the model to choose -- all mistakes on performance are related to some combinations of angles and velocities. If cross validation showed that the model did not perform well (probably due to overfitting if I could observe a big gap between the error on training and cross validating data), then I would try to look at the order of variable importance contributing to variable "classe" and remove several of those less important variables, and rebuild a random forest model. However, the straightforward random forest model seems to be a big pay-off: on training data, an accuracy of 100% was observed; on cross validation data, an accuracy of more than 99% was observed (see Fig. 2). (This model turned out to be a pay-off on the 20 testing data also, where an accuracy of 100% was observed.)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Figs		Figs
README.md		README.md
document.html		document.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document on Activity Recognition project

Goal

Data Partition and Model Build

About

Uh oh!

Releases

Packages

Languages

ihcinihsdk/ActivityRecognition

Folders and files

Latest commit

History

Repository files navigation

Document on Activity Recognition project

Goal

Data Partition and Model Build

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages