================================================================== ##Getting and Cleaning Data Course Project ###Version 1.0
####Origin of the data: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones ####Course website: https://class.coursera.org/getdata-012 ####Data before manipulation: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
###Instructions for use:
- Save and extract the zip file at the aforementioned link
- Place run_analysis.R inside of the extracted folder "UCI HAR Dataset"
- Source run_analysis
- Execute the code... > tidy_data <- run_analysis()
================================================================== ###Description of the script
The data inside the zip folder that the script uses is as follows:
- 'features.txt': List of all features.
- 'activity_labels.txt': Links the class labels with their activity name.
- 'train/X_train.txt': Training set.
- 'train/y_train.txt': Training labels.
- 'test/X_test.txt': Test set.
- 'test/y_test.txt': Test labels.
- 'train/subject_train.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.
- 'test/subject_train.txt': Each row identifies the subject who performed the activity for each window sample. Its range is from 1 to 30.
- After reading in the files, the script massages the data into appropriate data types.
- It then combines the training and test Subject, Activity, and test data respectively into data frames
- Next it combines the two data frames to get one large data frame 10299x563
- Next, it searches the features array for any test measurement that contains a mean or a std deviation
- Taking those indices, it subsets the main data frame to reduce it to 10299x81
- Using that same indice search, it applies unique names to the columns.
- Next, it starts looping through the data to take average of each activity performed by each subject
- To do that, it first subsets the data by the subject, then subsets by the activity.
- The average of that resulting data frame is calculated and appeneded to the previous run of that loop
- In the end, the result of the loop will be a new data frame with the size of 180x81 * 180 because it there are 30 subjects who perform 6 activities
- The last step is looping through the final data set and matching the activity number to the english description of the activity.