GitHub - AcousticMonster/CourseraGettingAndCleaningData: My files for the Coursera Getting and Cleaning Data project.

Getting and Cleaning Data: Course Project Introduction

This repository contains my files created for use with the Coursera project "Getting and Cleaning Data".

About the raw data

The data features an extract of cell phone readings of various subject owners movements (sit, stand, walking, laying, etc.). This dataset extracted from the following text files.

The Movement Description Text

activity_labels.txt

Traning data files

X_train.txt
Y_train.txt
subject_train.txt

Testing data files

X_test.txt
Y_test.txt
subject_test.txt

##About the script and the tidy dataset

The script was created to extract columns pertaining three axis points (X, Y, Z) for both Mean and Standard Deviation. Eighty eight (88) columns out of five hundred and sixty one (561) where used in this dataset. From the eighty eight columns a grouping was made of each subject and their activities. A rollup mean was calculated for each measurement.A text file named tidy.txt is then exported for used by the Coursera instructors.

The run_analysis.R file process the data in the following steps:

Loads library dplyr (must have this package install beforehand).
Reads the various text dataset files using read.csv into three datasets (activityLabels, training, and testing).
- example: read.csv("UCI HAR Dataset/activity_labels.txt", sep="", header=FALSE)
- note: the UCI HAR Dataset folder must be located in your R working directory
- the data url is here: [Link to Data Zip File] (https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip)
The training and testing datasets are combined into one called "combinedReadings".
Create a column name lookup dataset called "features".
Next cleanup the features column names by removing uneeded symbols "-", "()", ",".
Add column names to the combinedReadings dataset based on the features dataset text variables.
Subset the combinedReadings dataset to only include columns with Mean and Standard Deviation in the column names.
Loop through and assign the "activityLabels" text descriptions to the combinedReadings "activity" entries.
Use dplyr to summarize each column as means, grouped by "activity" and "subject".
Finally, write the data to a text file called tidy.txt for submission for grading assignment.

About the Code Book

The CodeBook.md file displays the various variable/column names, as well as the activity label text.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R
tidy.txt		tidy.txt
tidycolumns.txt		tidycolumns.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting and Cleaning Data: Course Project Introduction

About the raw data

About the Code Book

About

Uh oh!

Releases

Packages

Languages

AcousticMonster/CourseraGettingAndCleaningData

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data: Course Project Introduction

About the raw data

About the Code Book

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages