Skip to content

Commit 44417bb

Browse files
committed
Fixed RSS feed.
1 parent 234d3b3 commit 44417bb

10 files changed

+138
-128
lines changed

_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ people:
6868
url: https://www.linkedin.com/pub/courtney-rockenbach/1a/8a8/794
6969

7070
- name: Megan Wilson
71-
pic: default
71+
pic: megan
7272
position: Graduate Student (rotation)
7373

7474
- name: Maggie Wisniewska

_includes/rss_footer.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<footer class="bg-darkest-gray">
22
<div class="container">
3-
<span style="color: white">Subscribe <a href="{{ "/feed.xml" | prepend: site.baseurl }}">via RSS</a></span>
3+
<span style="color: white">Subscribe <a href="{{ "/feed.xml" | prepend: site.url }}">via RSS</a></span>
44
</div>
55
</footer>

_posts/2014-01-05-r-vs-python-round-1.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ date: 2014-01-05
44
author: Simon Garnier
55
layout: post
66
type: post
7-
category:
7+
category:
88
- blog
99
- rvspython
10+
- r
1011
published: true
1112

1213
---
@@ -31,11 +32,11 @@ ___
3132

3233
##### 1 - Introduction #####
3334

34-
For this first challenge, we will use data collected by Randy for his recent post on the ["Top 25 most violence packed
35-
films" in the history of the movie industry](www.randalolson.com/2013/12/31/most-violence-packed-films/). For his post,
36-
Randy generated a simple horizontal barchart showing the top 25 more violent films ordered by number of on screen deaths
37-
per minute. In the rest of this document, we will show you how to reproduce this graph using Python and how to achieve a
38-
similar result with R. We will detail the different steps of the process and provide for each step the corresponding
35+
For this first challenge, we will use data collected by Randy for his recent post on the ["Top 25 most violence packed
36+
films" in the history of the movie industry](www.randalolson.com/2013/12/31/most-violence-packed-films/). For his post,
37+
Randy generated a simple horizontal barchart showing the top 25 more violent films ordered by number of on screen deaths
38+
per minute. In the rest of this document, we will show you how to reproduce this graph using Python and how to achieve a
39+
similar result with R. We will detail the different steps of the process and provide for each step the corresponding
3940
code. You will also find the entire codes at the end of this document.
4041

4142
And now without further ado, let's get started!
@@ -47,7 +48,7 @@ First thing first, let's set up our working environment by loading some necessar
4748
{% highlight r %}
4849
# Load libraries
4950
library(lattice) # Very versatile graphics package
50-
library(latticeExtra) # Addition to lattice that makes layering graphs a breathe
51+
library(latticeExtra) # Addition to lattice that makes layering graphs a breathe
5152
{% endhighlight %}
5253

5354
{% highlight python %}
@@ -58,8 +59,8 @@ from pandas import *
5859
{% endhighlight %}
5960

6061

61-
Now let's load the data for today's job. The raw data were scraped by Randy (using Python) from
62-
[www.MovieBodyCounts.com](http://www.MovieBodyCounts.com) and he generously provided the result of his hard work on
62+
Now let's load the data for today's job. The raw data were scraped by Randy (using Python) from
63+
[www.MovieBodyCounts.com](http://www.MovieBodyCounts.com) and he generously provided the result of his hard work on
6364
FigShare at this address: [http://dx.doi.org/10.6084/m9.figshare.889719](http://dx.doi.org/10.6084/m9.figshare.889719).
6465

6566
{% highlight r %}
@@ -73,13 +74,13 @@ body_count_data = read_csv("http://files.figshare.com/1332945/film_death_counts.
7374
{% endhighlight %}
7475

7576

76-
For each movie, the data frame contains a column for the total number of on screen deaths ("Body_Count") and a column for
77-
the duration ("Length_Minutes"). We will now create an extra column for the number of on screen deaths per minute of each
77+
For each movie, the data frame contains a column for the total number of on screen deaths ("Body_Count") and a column for
78+
the duration ("Length_Minutes"). We will now create an extra column for the number of on screen deaths per minute of each
7879
movie ("Deaths_Per_Minute")
7980

8081
{% highlight r %}
81-
# Compute on screen deaths per minute for each movie.
82-
body.count.data <- within(body.count.data, {
82+
# Compute on screen deaths per minute for each movie.
83+
body.count.data <- within(body.count.data, {
8384
Deaths_Per_Minute <- Body_Count / Length_Minutes
8485
ord <- order(Deaths_Per_Minute, decreasing = TRUE) # useful later
8586
})
@@ -92,7 +93,7 @@ body_count_data["Deaths_Per_Minute"] = (body_count_data["Body_Count"].apply(floa
9293
{% endhighlight %}
9394

9495

95-
Now we will reorder the data frame by (descending) number of on screen deaths per minute, and select the top 25 most
96+
Now we will reorder the data frame by (descending) number of on screen deaths per minute, and select the top 25 most
9697
violent movies according to this criterion.
9798

9899
{% highlight r %}
@@ -112,7 +113,7 @@ body_count_data = body_count_data.sort("Deaths_Per_Minute", ascending=True)
112113
{% endhighlight %}
113114

114115

115-
In Randy's graph, the "y" axis shows the film title with the release date. We will now generate the full title for each
116+
In Randy's graph, the "y" axis shows the film title with the release date. We will now generate the full title for each
116117
movie following a "Movie name (year)" format, and append it to the data frame.
117118

118119
{% highlight r %}
@@ -138,7 +139,7 @@ ax.xaxis.tick_bottom()data["Full_Title"] = array(full_title)
138139
{% endhighlight %}
139140

140141

141-
Now we are ready to generate the barchart. We're going to start with the default options and then we will make this thing
142+
Now we are ready to generate the barchart. We're going to start with the default options and then we will make this thing
142143
look pretty.
143144

144145
{% highlight r %}
@@ -167,12 +168,12 @@ yticks(range(len(body_count_data["Full_Title"])), body_count_data["Full_Title"].
167168

168169
![Base Python graph](/img/posts/2014-01-05-r-vs-python-round-1/Py/basePy.png){: .full }
169170

170-
Ok, now let's make this pretty.
171+
Ok, now let's make this pretty.
171172

172173
{% highlight r %}
173174
# Create theme
174175
my.bloody.theme <- within(trellis.par.get(), { # Initialize theme with default value
175-
axis.line$col <- NA # Remove axes
176+
axis.line$col <- NA # Remove axes
176177
plot.polygon <- within(plot.polygon, {
177178
col <- "#8A0606" # Set bar colors to a nice bloody red
178179
border <- NA # Remove bars' outline
@@ -187,7 +188,7 @@ my.bloody.theme <- within(trellis.par.get(), { # Initialize theme with defaul
187188

188189
# Update figure with new theme + other improvements (like a title for instance)
189190
graph <- update(
190-
graph,
191+
graph,
191192
main='25 most violence packed films by deaths per minute', # Title of the barchart
192193
par.settings = my.bloody.theme, # Use custom theme
193194
xlab = NULL, # Remove label of x axis
@@ -222,7 +223,7 @@ ax.xaxis.grid(color="white", linestyle="-")
222223

223224
![Pretty Python graph](/img/posts/2014-01-05-r-vs-python-round-1/Py/prettyPy.png){: .full }
224225

225-
Finally, the last thing we want to add to our graph is the number of deaths per minute and the duration of each movie on
226+
Finally, the last thing we want to add to our graph is the number of deaths per minute and the duration of each movie on
226227
the right of the graph.
227228

228229
{% highlight r %}
@@ -231,8 +232,8 @@ body.count.data <- within(body.count.data, {
231232
Deaths_Per_Minute_With_Length = paste0(round(body.count.data$Deaths_Per_Minute, digits=2), " (", body.count.data$Length_Minutes, " mins)")
232233
})
233234

234-
# Add number of on screen deaths per minute and duration of movies at the end of each bar
235-
graph <- graph + layer(with(body.count.data,
235+
# Add number of on screen deaths per minute and duration of movies at the end of each bar
236+
graph <- graph + layer(with(body.count.data,
236237
panel.text(
237238
Deaths_Per_Minute, # x position of the text
238239
25:1, # y position of the text
@@ -279,7 +280,7 @@ library(grid) # Graphics library with better image plotting capabilities
279280

280281
# Download a pretty background image; mode is set to "wb" because it seems that
281282
# Windows needs it. I don't use Windows, I can't confirm
282-
download.file(url = "http://www.theswarmlab.com/wp-content/uploads/2014/01/bloody_gun.jpg",
283+
download.file(url = "http://www.theswarmlab.com/wp-content/uploads/2014/01/bloody_gun.jpg",
283284
destfile = "bloody_gun.jpg", quiet = TRUE, mode = "wb")
284285

285286
# Load gun image using "readJPEG" from the "jpeg" package
@@ -306,11 +307,10 @@ ___
306307

307308
R and Python source codes are available [here](https://github.com/morpionZ/R-vs-Python/tree/master/Deadliest%20movies/code).
308309

309-
For F# fan, [Terje Tyldum](http://terjetyl.ghost.io/) has written his version of the code in F#
310+
For F# fan, [Terje Tyldum](http://terjetyl.ghost.io/) has written his version of the code in F#
310311
[here](http://terjetyl.ghost.io/f-charting-challenge/).
311312

312-
Randy and I also recommend that you check out
313+
Randy and I also recommend that you check out
313314
[this post](http://nbviewer.ipython.org/github/yaph/ipython-notebooks/blob/master/Exploring%20Movie%20Body%20Counts.ipynb)
314-
by [Ramiro Gómez](http://ramiro.org/) ([@yaph](https://twitter.com/yaph)) where he does a more in-depth analysis of the
315+
by [Ramiro Gómez](http://ramiro.org/) ([@yaph](https://twitter.com/yaph)) where he does a more in-depth analysis of the
315316
data set we used for today’s challenge.
316-

_posts/2014-01-12-r-vs-python-round-2-1.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@ date: 2014-01-12
44
author: Simon Garnier
55
layout: post
66
type: post
7-
category:
7+
category:
88
- blog
99
- rvspython
10+
- r
1011
published: true
1112

1213
---
@@ -48,7 +49,7 @@ library(RCurl) # Everything necessary to grab webpages on the Web
4849
library(XML) # Everything necessary to parse XML and HTML code
4950
library(pbapply) # Progress bars!!! Just because why not :-)
5051

51-
# Create curl handle which can be used for multiple HHTP requests.
52+
# Create curl handle which can be used for multiple HHTP requests.
5253
# followlocation = TRUE in case one of the URLs we want to grab is a redirection
5354
# link.
5455
curl <- getCurlHandle(useragent = "R", followlocation = TRUE)
@@ -77,7 +78,7 @@ With all this information in mind, our first task is to create a list of all the
7778
# Prepare URLs of the movie lists alphabetically ordered by first letter of
7879
# movie title (capital A to Z, except for v and y) + "numbers" list (for movies
7980
# which title starts with a number)
80-
urls.by.letter <- paste0("http://www.moviebodycounts.com/movies-",
81+
urls.by.letter <- paste0("http://www.moviebodycounts.com/movies-",
8182
c("numbers", LETTERS[1:21], "v", "W" , "x", "Y", "Z"), ".htm")
8283

8384
{% endhighlight %}
@@ -100,15 +101,15 @@ For each movie list, we will...
100101
{% highlight r %}
101102
# For each movie list... For loops are frowned upon in R, let's use the classier
102103
# apply functions instead. Here I use the pblapply from the pbapply package.
103-
# It's equivalent to the regular lapply function, but it provides a neat
104-
# progress bar. Unlist to get a vector.
104+
# It's equivalent to the regular lapply function, but it provides a neat
105+
# progress bar. Unlist to get a vector.
105106
urls.by.movie <- unlist(pblapply(urls.by.letter, FUN = function(URL) {
106107
{% endhighlight %}
107108

108109

109110
{% highlight python %}
110111
list_of_films = []
111-
112+
112113
# Go through each movie list page and gather all of the movie web page URLs
113114
for letter in letters:
114115
try:
@@ -149,9 +150,9 @@ for letter in letters:
149150

150151

151152
{% highlight r %}
152-
# Extract desired links from HTML content using XPath.
153+
# Extract desired links from HTML content using XPath.
153154
# The desired links are all the URLs ("a/@href") directly following
154-
# ("/following::") the image which source file is called "graphic-movies.jpg"
155+
# ("/following::") the image which source file is called "graphic-movies.jpg"
155156
# ("//img[@src='graphic-movies.jpg']").
156157
links <- as.vector(xpathSApply(parsed.html, "//img[@src='graphic-movies.jpg']/following::a/@href"))
157158

@@ -181,7 +182,7 @@ urls.by.movie <- urls.by.movie[-ix]
181182
# The URL is in between parentheses (), so we can simply split the string on those
182183
# Some URLs are full URLs, e.g. www.moviebodycounts.com/movie_name.html, so splitting on the / gives us only the page name
183184
list_of_films.append(line.split("(")[-1].strip(")").split("/")[-1])
184-
185+
185186
# If the movie list page doesn't exist, keep going
186187
except:
187188
print "\nerror with " + letter + "\n"
@@ -194,7 +195,7 @@ For each movie, we will...
194195

195196

196197
{% highlight r %}
197-
# For each movie...
198+
# For each movie...
198199
# do.call(rbind, ...) to reorganize the results in a nice data frame
199200
data <- do.call(rbind, pblapply(urls.by.movie, FUN = function(URL) {
200201
{% endhighlight %}
@@ -205,7 +206,7 @@ data <- do.call(rbind, pblapply(urls.by.movie, FUN = function(URL) {
205206
# extract the movie name, kill counts, etc.
206207
out_file = open("film-death-counts.csv", "wb")
207208
out_file.write("Film,Year,Kill_Count,IMDB_url\n")
208-
209+
209210
for film_page in list_of_films:
210211
try:
211212
# The information we're looking for on the page:
@@ -320,15 +321,15 @@ for film_page in list_of_films:
320321
# Using gsub, remove everything in parenthesis and all non number characters
321322
Body_Count <- gsub("\\(.*?\\)", " ", Body_Count)
322323
Body_Count <- gsub("[^0-9]+", " ", Body_Count)
323-
324+
324325
# In case the total count has been split, we want to separate these numbers
325326
# from each other so that we can add them up later. Using strsplit, split the
326327
# character string at spaces
327328
Body_Count <- unlist(strsplit(Body_Count, " "))
328-
329+
329330
# For now, we have extracted characters. Transform them into numbers.
330331
Body_Count <- as.numeric(Body_Count)
331-
332+
332333
# Sum up the numbers (in case they have been split into separate categories.
333334
Body_Count <- sum(Body_Count, na.rm = TRUE)
334335
{% endhighlight %}
@@ -380,6 +381,6 @@ ___
380381

381382
#### 4 - Bonus for the braves ####
382383

383-
Today's challenge was code and text heavy. No pretty pictures to please the eye. So, for all the brave people who made it to the end, here is a cat picture :-)
384+
Today's challenge was code and text heavy. No pretty pictures to please the eye. So, for all the brave people who made it to the end, here is a cat picture :-)
384385

385-
![Programming cat](/img/posts/2014-01-12-r-vs-python-round-2-1/programming_cat.jpg){: .full }
386+
![Programming cat](/img/posts/2014-01-12-r-vs-python-round-2-1/programming_cat.jpg){: .full }

0 commit comments

Comments
 (0)