Your first major assignment is a set of exercises based around a single dataset called rail_trail, which will provide you with practice in creating visualizations using R and ggplot2.
Due: Month Day, Year @ Time
Use the R Markdown file homework_1.Rmd
to do your work and write-up
when completing the questions below. Remember to fill in your name at
the top of the R Markdown document and be sure to save, commit, and push
(upload) frequently to Github so that you have incremental snapshots of
your work. When you’re done, follow the How to submit
section below to setup a Pull Request, which will be used for feedback.
-
Remember that the point of us using RMarkdown documents is to combine code and writeups! Each block of R code should have some sort of explanation or justification using full sentences.
-
Your grade will take into account your code, your explanations, and whether your document looks nice when “knitted” to PDF.
For this homework assignment, you will be working though a set of
visualization problems based on the rail_trail
dataset. The
rail_trail
dataset was collected by the Pioneer Valley Planning
Commission (PVPC) and counts the number of people that walked through a
sensor on a rail trail during a ninety day period. A rail trail is
a retired or abandoned railway that was converted into a walking
trail. The data was collected
from April 5, 2005 to November 15, 2005 using a laser sensor placed at a
location north of Chestnut Street in Florence,
MA.
The dataset contains the following variables:
Variable | Description |
---|---|
hightemp |
daily high temperature (in degrees Fahrenheit) |
lowtemp |
daily low temperature (in degrees Fahrenheit) |
avgtemp |
average of daily low and daily high temperature (in degrees Fahrenheit) |
season |
indicates whether the season was Spring, Summer, or Fall |
cloudcover |
measure of cloud cover (in oktas) |
precip |
measure of precipitation (in inches) |
volume |
estimated number of trail users that day (number of breaks recorded) |
weekday |
indicator of whether the day was a non-holiday weekday |
When describing the contents of a visualization, follow the ideas discussed in these resources:
-
In the
rail_trail
dataset, how many rows are there? How many columns? Which variables in the dataset are continuous/numerical and which are categorical? -
Create a histogram of the variable
volume
using the following code:ggplot(data = rail_trail) + geom_histogram(mapping = aes(x = volume))
Describe the shape and center of the distribution. Afterward, try adjusting the size of the histogram bins by adding the
binwidth
input. To start with, usebinwidth = 21
. If you need help with where to placebinwidth
, read the documentation by running?geom_histogram
in your Console window. Then, find a binwidth that’s too narrow and another one that’s too wide to produce a meaningful histogram. -
Choosing a proper bin width for a histogram can be tricky, and for that reason it’s preferable to use visualizations that avoid using bin widths when possible. An easy-to-use alternative to the histogram is
geom_density
, which creates a density plot. Usegeom_density
to create a density plot of the variablevolume
. -
Create a density plot for each of the remaining numerical variables, and describe the shape and center of each distribution. Are there any distributions that are similar in shape to each other?
-
Use
geom_point()
to create a scatterplot that plotsweekday
versusseason
. Why is this plot not useful? -
Create a
geom_count()
plot (an alternative to a mosaic plot) using the same variables you considered in question 5:ggplot(data = rail_trail) + geom_count(mapping = aes(x = season, y = weekday))
Which circle in the plot takes up the most area? Explain the meaning of the different size circles in the plot and what information it contains that is missing in the previous scatter plot.
-
Run
?geom_bar
in the Console window and read the documentation forgeom_bar()
, and then look at the entry for it on the ggplot2 cheatsheet Usegeom_bar()
to reproduce the following bar chart:After reproducing the plot, explain what the height of each bar means.
-
Starting from the code snippet you deduced in question 7, create two more bar charts:
-
Create a bar chart by supplying the input
position = "dodge"
togeom_bar()
-
Create a bar chart by supplying the input
position = "fill"
togeom_bar()
.
After creating the visualizations, describe the feature that
position
controls. -
-
Create a bar chart that maps its aesthetic
aes()
toprecip > 0
. Interpret what this bar chart means. -
Create a scatter plot of
volume
versushightemp
usinggeom_point()
. Describe any trends that you see. -
Take the code snippet you wrote for question 10 and map the
weekday
variable tocolor
. Then create a second plot where, instead of mappingweekday
tocolor
, you facet overweekday
using eitherfacet_wrap()
orfacet_grid()
. Discuss the advantages and disadvantages to faceting instead of mapping to thecolor
aesthetic. How might the balance change if you had a larger dataset? -
Take the code snippet that you wrote down in question 11 that faceted over
weekday
and create a model for each facet panel usinggeom_smooth()
. Discuss the trends in the number of rail trail users thatgeom_smooth()
picks up. -
Copy the code snippet you deduced in question 12 and use the input
se = FALSE
forgeom_smooth()
. What does these
input option forgeom_smooth()
control?
When you are ready to submit, be sure to save, commit, and push your final result so that everything is synchronized to Github. Then, navigate to your copy of the Github repository you used for this assignment. You should see your repository, along with the updated files that you just synchronized to Github. Confirm that your files are up-to-date, and then do the following steps:
-
Click the Pull Requests tab near the top of the page.
-
Click the green button that says “New pull request”.
-
Click the dropdown menu button labeled “base:”, and select the option
grading
. -
Confirm that the dropdown menu button labeled “compare:” is set to
master
. -
Click the green button that says “Create pull request”.
-
Give the pull request the following title:
Submission: Homework 1, FirstName LastName
, replacingFirstName
andLastName
with your actual first and last name. -
In the messagebox, write:
My homework submission is ready for grading @instructor_username
. -
Click “Create pull request” to lock in your submission.
You are encouraged to review and keep the following cheatsheets handy while working on this assignment: