GitHub - scottbedwell/gettingcleaningdata

This readme document is intended to explain the analysis completed in the "run_analysis.R" code.

The appropriate libraries are loaded
The various data files are read in. This includes
- features.txt
- activity_labels.txt
- X_train.txt
- Y_train.txt
- subject_train.txt
- X_test.txt
- Y_test.txt
- subject_test.txt
A "group" colum is created for Train vs Test
A "subject" colum is created from the subject files
An "activity.id" colum is created from the Y_train and Y_Test files
A "set" dataframe is created from combining the train and test dataframes
The "set" dataframe is filtered down to any column with "mean" or "std" in the name. The instructions were unclear as to include only columns with "mean" or "std" anywhere in the name, or only at the end. I chose to include anywhere in the name.
Activity names are added from the activity_labels file using the join function based on the activity id
The column names are cleaned up by replacing "." with " ", removing multiple spaces, and trimming
All metrics are aggregated using mean (FUN=mean), grouping by activity name and subject. There was some debate on the forumns whether to aggregate all variables or only mean/std variables. My interpretation was to aggregate the final set, so only mean/std variables.
The names of the grouping variables are set
the tidy data set is written to tidyDataSet.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback