Skip to content

Commit

Permalink
fixed bug in search url
Browse files Browse the repository at this point in the history
  • Loading branch information
Pold87 committed Feb 17, 2016
1 parent d226102 commit cf892b3
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 1 deletion.
18 changes: 18 additions & 0 deletions README.md~
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Academic word frequency extractor

This script extracts the word frequency in papers of a search term. It writes the number of papers containing this
word at a certain year to a CSV file:

| year | results |
|------+---------|
| 2011 | 6320 |
| 2012 | 7250 |
| 2013 | 8170 |
| 2014 | 8260 |
| 2015 | 8150 |


The script excludes patents and citations



5 changes: 5 additions & 0 deletions extract_num_results.py~
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('https://scholar.google.nl/scholar?hl=en&as_sdt=1,5&q=self-disclosure').read()
soup = BeautifulSoup(r)
print type(soup)
2 changes: 1 addition & 1 deletion extract_occurrences.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def get_num_results(search_term, start_date, end_date):

# Open website and read html
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.109 Safari/537.36'
url = "https://scholar.google.nl/scholar?as_vis=1&q=self-disclosure&hl=en&as_sdt=1,5&as_ylo={0}&as_yhi={1}".format(start_date, end_date)
url = "https://scholar.google.nl/scholar?as_vis=1&q={0}&hl=en&as_sdt=1,5&as_ylo={1}&as_yhi={2}".format(search_term, start_date, end_date)
opener = build_opener()
request = Request(url=url, headers={'User-Agent': user_agent})
handler = opener.open(request)
Expand Down
2 changes: 2 additions & 0 deletions out.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
year,results
2014,8260
7 changes: 7 additions & 0 deletions table.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
| year | results |
|------+---------|
| 2011 | 6320 |
| 2012 | 7250 |
| 2013 | 8170 |
| 2014 | 8260 |
| 2015 | 8150 |
6 changes: 6 additions & 0 deletions table.org~
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
year,results
2011,6320
2012,7250
2013,8170
2014,8260
2015,8150

0 comments on commit cf892b3

Please sign in to comment.