This project classifies the way of weight-lifting exercise in 6 healthy participants between 20-28 years old. The classification is based on the detection of common mistakes during weight-lifting (classes B, C, D, E) or not (class A). The classification was predicted by fitting three model using 36 features from a dataset obtained from recording acceleration in axis x, y and z in 4 wearables accelerometers per subject during the 5 classes of exercise. The accuracy was of each model calculated and the best was chosen to predict the classes of exercise of 20 participants of a testing dataset.
The training and testing data were obtained by Velloso et.al (2013) and download from:
training <- read.csv(url("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"))
testing <- read.csv(url("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"))
Variables as user_name as those related with information of the windows and recordig times were excluded (1)
training = training[, -c(1:7)]
testing = testing[, -c(1:7)]
Given the nature of the test dataset have not more than one window of recording, variables as kurtosis, skewness, maximun, minimum, amplitude, variance, average and standard deviation were not useful in our case.
indextrain = grep("kurtosis|skewness|avg|max|min|amplitude|var|avg|stddev", colnames(training)) training = training[,-indextrain]
indextest = grep("kurtosis|skewness|avg|max|min|amplitude|var|avg|stddev", colnames(testing)) testing = testing[,-indextest]
Finally the data was checked for NA cases
any(is.na(training))
## [1] FALSE
The accelerometer devices has 3 components: gyroscope, magnetometer and accelerometer. From each one, 3 measurements in the three axis x, y and z, were obtained. Total acceleration and Euler angles (roll, pitch and yaw) are obtained from the previous variables, so they were excluded.
index = grep("pitch|yaw|roll|total_accel", colnames(training)) training = training[,-index]
index = grep("pitch|yaw|roll|total_accel", colnames(testing)) testing = testing[,-index]
library(caret)
## Warning: package 'caret' was built under R version 3.2.3
## Loading required package: lattice
## Loading required package: ggplot2
10-fold crossvalidation was used for 3 different models used. For reproducibility purposes a seed value was set up of the 3 parallel models.
seed = set.seed(123)
Tcontrol = trainControl(method = "cv", number = 10)
Three different models were produced: a boosted tree (modgbm)
Boosted tree(modgbm)
set.seed(123)
modgbm = train(classe~., data = training, method = "gbm", trControl = Tcontrol)
Linear Discriminant Analysis (modlda)
set.seed(123)
modlda = train(classe~., data = training, method = "lda", trControl = Tcontrol)
Multinomial logistic Regression
set.seed(123)
modmultinom = train(classe~., data = training, method = "multinom", trControl = Tcontrol)
resamps <- resamples(list(GBM = modgbm,LDA = modlda,MUL = modmultinom))
summary(resamps)
##
## Call:
## summary.resamples(object = resamps)
##
## Models: GBM, LDA, MUL
## Number of resamples: 10
##
## Accuracy
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GBM 0.9021 0.9065 0.9088 0.9094 0.9129 0.9153 0
## LDA 0.6123 0.6300 0.6324 0.6336 0.6407 0.6509 0
## MUL 0.5632 0.5789 0.5895 0.5886 0.5981 0.6131 0
##
## Kappa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GBM 0.8760 0.8816 0.8845 0.8853 0.8897 0.8927 0
## LDA 0.5088 0.5314 0.5334 0.5354 0.5443 0.5572 0
## MUL 0.4504 0.4690 0.4814 0.4805 0.4924 0.5097 0
bwplot(resamps, layout = c(3, 1))
The chosen model is Boosted trees.
predicted = predict(modgbm, testing)
predicted
## [1] A A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
Using the original data from the accelerometer, it is possible to predict the class of exercise on the 6 participants with a Boosted Trees Model.