Data Cleaning Project for the Coursera "Getting and Cleaning Data" course

The dataset used is the UCI HAR Dataset

Notes on what the run_analysis.R script performs

Cleans up the environemnt (so be careful to save your environment before you run the script!)
Loads libraries and obtains the working directory. Assumption is the UCI HAR Dataset directory would be present in the working directory
Reads the data from the appropriate directories/files
- Reference data
- Test data and
- Training data
Names/renames the columns (aka variables) using the Reference data

NOTE: Merging the Test and Train datasets have been put off until reduced datasets containing only the variables of interest are created
Select the variables of interest ( all computed mean() and stdev() ) from the test and train datasets (we still have separate datasets for test and train)
Enhace the dataset obtained above with the subject and activity information from the y_ and subject_ files
- for this step, activity numbers obtained from the 'y_' files have been reinterpreted utilizing the information obtained from the activitylabels.txt file (a new column was added to the 'y' data)
Merge the test and train datasets. We now get a dataset with rows equal to the sum of the number of rows in x_test and x_train. The resulting dataset has most column names obtained from the the reference.txt file
Variables of the dataset are labeled with descriptive variable names by interpreting the column names obtained from the the reference.txt file
A new dataset is created by computing the average of all variables by groups of
- the subject (which remains a numeric identifier)
- then by the activity label (6 types, obtained from the activitylabels.txt file)
NOTE: This is the final tidy dataset
The data frame (containing the final data set) is written into the UCI_HAR.txt file
The environment is tidyed up post the script activities were performed

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
UCI_HAR.txt		UCI_HAR.txt
UCI_HAR_CODEBOOK.pdf		UCI_HAR_CODEBOOK.pdf
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Cleaning Project for the Coursera "Getting and Cleaning Data" course

The dataset used is the UCI HAR Dataset

Notes on what the run_analysis.R script performs

About

Uh oh!

Releases

Packages

Languages

ari-git/datacleaningprojectUCIHAR

Folders and files

Latest commit

History

Repository files navigation

Data Cleaning Project for the Coursera "Getting and Cleaning Data" course

The dataset used is the UCI HAR Dataset

Notes on what the run_analysis.R script performs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages