Skip to content

rainerthiel/getdata-014.project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

getdata-014.project

This repository contains my submission for the Getting and Cleaning Data Course Project May 2015.

This ReadMe file includes the following:

  • A description of the project environment
  • Package dependencies
  • Some assumptions
  • A description of the runAnalysis.R script

There is a separate Codebook.txt files that describes the output dataset.

Environment

  • Intel Core i5 @ 1.6GHz; 4GB RAM; 256GB SSD
  • Windows 7 Home Premium SP1
  • R version 3.1.3 (2015-03-09)
  • RStudio Version 0.98.1103

Package Dependencies

  • plyr
  • data.table

Assumptions

  • the script can be run offline, that is:
  • the raw input dataset has been downloaded and unzipped into the working directory (where the script is located). The script is not expected to access input data from the downloaded zip file directly. If the inputs folder is not found, the script reports the error and stops.
  • all required packages are already installed

Script Synopsis

  1. Loads libraries and checks for data input folder. Exits with error message if not found.
  2. Loads raw test and train data files into R data frames.
  3. experimental observations
  4. activities
  5. subjects
  6. Loads supporting datainto R data frames
  7. activity labels
  8. feature names (variables)
  9. Combines all test and train datasets
  10. combine train_ and test_ data - the experimental observations
  11. combine train_ and test_ labels and assign a meaningful variable name
  12. combine train_ and test_ subjects and assign a meaningful variable name
  13. Trim all_data down to the required variables - means and std deviations only.
  14. Assigns meaningful variable_names to all_data column names. Uses the variable names provided in features.txt
  15. Column binds subjects and labels onto all_data
  16. Merges activity labels onto all_data and drops the activity_id column
  17. Creates the final tidy dataset and writes it out.
  18. Sort the data into activity / subject sequence
  19. Reshape into summarised data
  20. Output data is in the wide data form.

About

Getting and Cleaning Data Course Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages