This readme document is intended to explain the analysis completed in the "run_analysis.R" code.
-
The appropriate libraries are loaded
-
The various data files are read in. This includes
- features.txt
- activity_labels.txt
- X_train.txt
- Y_train.txt
- subject_train.txt
- X_test.txt
- Y_test.txt
- subject_test.txt
-
A "group" colum is created for Train vs Test
-
A "subject" colum is created from the subject files
-
An "activity.id" colum is created from the Y_train and Y_Test files
-
A "set" dataframe is created from combining the train and test dataframes
-
The "set" dataframe is filtered down to any column with "mean" or "std" in the name. The instructions were unclear as to include only columns with "mean" or "std" anywhere in the name, or only at the end. I chose to include anywhere in the name.
-
Activity names are added from the activity_labels file using the join function based on the activity id
-
The column names are cleaned up by replacing "." with " ", removing multiple spaces, and trimming
-
All metrics are aggregated using mean (FUN=mean), grouping by activity name and subject. There was some debate on the forumns whether to aggregate all variables or only mean/std variables. My interpretation was to aggregate the final set, so only mean/std variables.
-
The names of the grouping variables are set
-
the tidy data set is written to tidyDataSet.txt