Name		Name	Last commit message	Last commit date
parent directory ..
Codebook.md		Codebook.md
README.md		README.md
avg_per_activity_and_subject.txt		avg_per_activity_and_subject.txt
run_analysis.R		run_analysis.R

README.md

README for UIC HAR Data Analysis script

nram, Apr 24, 2015

Pl view the Raw version for readability

The script run_analysis.R does the following #----------------------------------------------------------------------------

Download and unzip file using download.file() and unzip() respectively

#---------------------------------------------------------------------------- 2) Read the unzipped data files using read.table(). It checks the data directory and a file (features.txt) exists as a sanity check before reading the file

#---------------------------------------------------------------------------- 3) Cleansup the header using a series of gsub(). Instead of manually assigning the headers, the header names are converted to CamelShell format, getting rid of () and - It also addes additional columns that are generated with in the script.

This results in header like this:

sample(header,10) [1] "TimeGravityAccMadY" "TimeBodyGyroMeanZ"
[3] "TimeGravityAccArCoeffY_2" "FreqBodyAccKurtosisX"
[5] "TimeBodyAccJerkEnergyZ" "TimeBodyGyroJerkMeanZ"
[7] "TimeBodyAccIqrZ" "TimeBodyGyroJerkIqrY"
[9] "TimeBodyAccJerkMagEnergy" "TimeGravityAccMagMad"

#---------------------------------------------------------------------------- 4) Merge data files. activity_labels and y_test (activity_index) are merged with join_all this gives a wide table in the form

head(l_test) V1 V2 1 5 STANDING 2 5 STANDING 3 5 STANDING 4 5 STANDING 5 5 STANDING 6 5 STANDING

Then this is meraged with x_test (dataset) using cbind with additional columns for activity name and subject A column type is added and assigned value "test" to identify this

The same is repeated for training data with resulting x_train dataset with additional columns for activity name and subject A column type is added and assigned value "train" to identify this

Both x_test and x_train are assitned common header (so they can be combined)

At the end, x_test and x_train are combined with rbind creating a huge dataset with both test and training data

dim(comb_data) [1] 10299 564

This combined data has the activty labels for each measurement.

table(mean_and_std_of_comb_data$Activity)

        LAYING            SITTING           STANDING            WALKING WALKING_DOWNSTAIRS   WALKING_UPSTAIRS 
          1944               1777               1906               1722               1406               1544

#---------------------------------------------------------------------------- 5) Extract only select colums (that have Std or Mean) that have mean and standard deviation from this combined data called mean_and_std_of_comb_data

dim(mean_and_std_of_comb_data) [1] 10299 89

#---------------------------------------------------------------------------- 6) The mean_and_std_of_comb_data is flattenned to create a narrow data with melt, with Activity and Subject as id and all other fields as variables.

The resulting data (narrow_data) is a long list of measurements, one per line for each of Activity and Subject combination

dim(narrow_data) [1] 339867 4 head(narrow_data) Activity Subject variable value 1 STANDING 2 TimeBodyAccStdX -0.938 2 STANDING 2 TimeBodyAccStdX -0.975 3 STANDING 2 TimeBodyAccStdX -0.994 4 STANDING 2 TimeBodyAccStdX -0.995 5 STANDING 2 TimeBodyAccStdX -0.994 6 STANDING 2 TimeBodyAccStdX -0.994

From this using dcast, aggregate of Activity/Subjet for each measure 
is calculated with mean function to get the average of each measure 
per Activity/Subject.

head(avg_per_activity_and_subject[,1:3]) Activity Subject TimeBodyAccMeanX 1 LAYING 1 0.222 2 LAYING 2 0.281 3 LAYING 3 0.276 4 LAYING 4 0.264 5 LAYING 5 0.278 6 LAYING 6 0.249

#---------------------------------------------------------------------------- 7) The resultting average value is writen to file (for uploading to github as required for the assignment)

#---------------------------------------------------------------------------- 8) All large datasets created in the file are removed before exiting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UIC_HAR_Data_Analysis

UIC_HAR_Data_Analysis

README.md

README for UIC HAR Data Analysis script

nram, Apr 24, 2015

Pl view the Raw version for readability

Files

UIC_HAR_Data_Analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

UIC_HAR_Data_Analysis

Folders and files

parent directory

README.md

README for UIC HAR Data Analysis script

nram, Apr 24, 2015

Pl view the Raw version for readability