Project Assigment Prediction WriteUp.Rmd

---
title: "Project Assigment Writeup Machine Learning Course"
author: "CEA"
date: "Sunday, April 03, 2016"
output: html_document
---
# Summary 

This project classifies the way of weight-lifting exercise in 6 healthy participants between 20-28 years old. The classification is based on the detection of common mistakes during weight-lifting (classes B, C, D, E) or not (class A). The classification was predicted by fitting three model using 36 features from a dataset obtained from recording acceleration in axis x, y and z in 4 wearables accelerometers per subject during the 5 classes of exercise. The accuracy was of each model calculated and the best was chosen to predict the classes of exercise of 20 participants of a testing dataset. 

# Data

The training and testing data were obtained by Velloso et.al (2013) and download from:
```{r}

training <- read.csv(url("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"))
testing <- read.csv(url("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"))

```

# Data Cleaning

Variables as user_name as those related with information of the windows and recordig times were excluded (1)  

```{r}
training = training[, -c(1:7)]
testing = testing[, -c(1:7)]

```

Given the nature of the test dataset have not more than one window of recording, variables as kurtosis, skewness, maximun, minimum, amplitude, variance, average and standard deviation were not useful in our case.

```{r}
indextrain = grep("kurtosis|skewness|avg|max|min|amplitude|var|avg|stddev", colnames(training))
training = training[,-indextrain]

indextest = grep("kurtosis|skewness|avg|max|min|amplitude|var|avg|stddev", colnames(testing))
testing = testing[,-indextest]

```

Finally the data was checked for NA cases

```{r}
any(is.na(training))
```


# Feature Selection

The accelerometer devices has 3 components: gyroscope, magnetometer and accelerometer. From each one, 3 measurements in the three axis x, y and z, were obtained. Total acceleration and Euler angles (roll, pitch and yaw) are obtained from the previous variables, so they were excluded.  

```{r}
index = grep("pitch|yaw|roll|total_accel", colnames(training))
training = training[,-index]

index = grep("pitch|yaw|roll|total_accel", colnames(testing))
testing = testing[,-index]
```

# Packages

```{r}
library(caret)
```
# Cross-validation

10-fold crossvalidation was used for 3 different models used. For reproducibility purposes a seed value was set up of the 3 parallel models.

```{r}

seed = set.seed(123)
Tcontrol = trainControl(method = "cv", number = 10)

```

# Models

 Three different models were produced: a boosted tree (modgbm)

Boosted tree(modgbm)
```{r cache=TRUE, echo = TRUE, results = 'hide'}
set.seed(123)
modgbm = train(classe~., data = training, method = "gbm", trControl = Tcontrol)
```

Linear Discriminant Analysis (modlda)

```{r cache =TRUE, echo = TRUE, results = 'hide'}
set.seed(123)
modlda = train(classe~., data = training, method = "lda", trControl = Tcontrol)
```

Multinomial logistic Regression

```{r cache = TRUE, echo = TRUE, results = 'hide'}
set.seed(123)
modmultinom = train(classe~., data = training, method = "multinom", trControl = Tcontrol)
```

# Choosing the best model

```{r}
resamps <- resamples(list(GBM = modgbm,LDA = modlda,MUL = modmultinom))
summary(resamps)
bwplot(resamps, layout = c(3, 1))
```

The chosen model is Boosted trees.

# Prediction

```{r}

predicted = predict(modgbm, testing)
predicted
```

# Conclusion

Using the original data from the accelerometer, it is possible to predict the class of exercise on the 6 participants with a Boosted Trees Model.

# References

Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human '13) . Stuttgart, Germany: ACM SIGCHI, 2013.