Unfortunately, Google Scholar does not support exporting results... I needed the most cited papers for a research project, and after trying an imperfect script I decided to write my own.
Important note: The spiders don't send more than 2 requests per second to Google Scholar. The reason is that we don't like to solve the CAPTCHA, so it's better to wait a little and acting like a human. Changing IP address sometimes is a good idea... 😩
- Supports multiple languages
- Customizable date range
- Sorts by number of citations
- Sorts by year
- Searches for articles
- Searches for case law
- Searches in a profile by ID
- Graphical interface
Install the dependencies:
pip install -r requirements.txt
Run the scraper just by typing the keyword:
python core.py "cryptography"
Customize the date range:
python core.py "metaverse" -s 1997 -e 2018
Limit the languages to one or more:
python core.py "medical" -l en es zh-tw fr
Set the output file path:
python core.py "machine learning" -s 2002 -o exports/most_cited_ml_articles_since_2002.csv
Sort the output by year:
python core.py "oceanography" -y
Search for case law:
python core.py "privacy" -c
Get a specific profile articles by the user ID:
python core.py "nms69lqaaaaj" -p -o jeff_dean_articles.csv
Make the program quiet:
python core.py "philosophy" -e 1234 -q
Here is some example exports to see if the scraper meets your needs or not!
This project is licensed under the MIT license found in the LICENSE file in the root directory of this repository.