-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathProject Assigment Prediction WriteUp.Rmd
125 lines (82 loc) · 3.84 KB
/
Project Assigment Prediction WriteUp.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: "Project Assigment Writeup Machine Learning Course"
author: "CEA"
date: "Sunday, April 03, 2016"
output: html_document
---
# Summary
This project classifies the way of weight-lifting exercise in 6 healthy participants between 20-28 years old. The classification is based on the detection of common mistakes during weight-lifting (classes B, C, D, E) or not (class A). The classification was predicted by fitting three model using 36 features from a dataset obtained from recording acceleration in axis x, y and z in 4 wearables accelerometers per subject during the 5 classes of exercise. The accuracy was of each model calculated and the best was chosen to predict the classes of exercise of 20 participants of a testing dataset.
# Data
The training and testing data were obtained by Velloso et.al (2013) and download from:
```{r}
training <- read.csv(url("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"))
testing <- read.csv(url("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"))
```
# Data Cleaning
Variables as user_name as those related with information of the windows and recordig times were excluded (1)
```{r}
training = training[, -c(1:7)]
testing = testing[, -c(1:7)]
```
Given the nature of the test dataset have not more than one window of recording, variables as kurtosis, skewness, maximun, minimum, amplitude, variance, average and standard deviation were not useful in our case.
```{r}
indextrain = grep("kurtosis|skewness|avg|max|min|amplitude|var|avg|stddev", colnames(training))
training = training[,-indextrain]
indextest = grep("kurtosis|skewness|avg|max|min|amplitude|var|avg|stddev", colnames(testing))
testing = testing[,-indextest]
```
Finally the data was checked for NA cases
```{r}
any(is.na(training))
```
# Feature Selection
The accelerometer devices has 3 components: gyroscope, magnetometer and accelerometer. From each one, 3 measurements in the three axis x, y and z, were obtained. Total acceleration and Euler angles (roll, pitch and yaw) are obtained from the previous variables, so they were excluded.
```{r}
index = grep("pitch|yaw|roll|total_accel", colnames(training))
training = training[,-index]
index = grep("pitch|yaw|roll|total_accel", colnames(testing))
testing = testing[,-index]
```
# Packages
```{r}
library(caret)
```
# Cross-validation
10-fold crossvalidation was used for 3 different models used. For reproducibility purposes a seed value was set up of the 3 parallel models.
```{r}
seed = set.seed(123)
Tcontrol = trainControl(method = "cv", number = 10)
```
# Models
Three different models were produced: a boosted tree (modgbm)
Boosted tree(modgbm)
```{r cache=TRUE, echo = TRUE, results = 'hide'}
set.seed(123)
modgbm = train(classe~., data = training, method = "gbm", trControl = Tcontrol)
```
Linear Discriminant Analysis (modlda)
```{r cache =TRUE, echo = TRUE, results = 'hide'}
set.seed(123)
modlda = train(classe~., data = training, method = "lda", trControl = Tcontrol)
```
Multinomial logistic Regression
```{r cache = TRUE, echo = TRUE, results = 'hide'}
set.seed(123)
modmultinom = train(classe~., data = training, method = "multinom", trControl = Tcontrol)
```
# Choosing the best model
```{r}
resamps <- resamples(list(GBM = modgbm,LDA = modlda,MUL = modmultinom))
summary(resamps)
bwplot(resamps, layout = c(3, 1))
```
The chosen model is Boosted trees.
# Prediction
```{r}
predicted = predict(modgbm, testing)
predicted
```
# Conclusion
Using the original data from the accelerometer, it is possible to predict the class of exercise on the 6 participants with a Boosted Trees Model.
# References
Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.