This repository contains my submission for the Getting and Cleaning Data Course Project May 2015.
This ReadMe file includes the following:
- A description of the project environment
- Package dependencies
- Some assumptions
- A description of the runAnalysis.R script
There is a separate Codebook.txt files that describes the output dataset.
- Intel Core i5 @ 1.6GHz; 4GB RAM; 256GB SSD
- Windows 7 Home Premium SP1
- R version 3.1.3 (2015-03-09)
- RStudio Version 0.98.1103
- plyr
- data.table
- the script can be run offline, that is:
- the raw input dataset has been downloaded and unzipped into the working directory (where the script is located). The script is not expected to access input data from the downloaded zip file directly. If the inputs folder is not found, the script reports the error and stops.
- all required packages are already installed
- Loads libraries and checks for data input folder. Exits with error message if not found.
- Loads raw test and train data files into R data frames.
- experimental observations
- activities
- subjects
- Loads supporting datainto R data frames
- activity labels
- feature names (variables)
- Combines all test and train datasets
- combine train_ and test_ data - the experimental observations
- combine train_ and test_ labels and assign a meaningful variable name
- combine train_ and test_ subjects and assign a meaningful variable name
- Trim all_data down to the required variables - means and std deviations only.
- Assigns meaningful variable_names to all_data column names. Uses the variable names provided in features.txt
- Column binds subjects and labels onto all_data
- Merges activity labels onto all_data and drops the activity_id column
- Creates the final tidy dataset and writes it out.
- Sort the data into activity / subject sequence
- Reshape into summarised data
- Output data is in the wide data form.