A scrapper which generates a google sheet with the top 10 posts of each month for a given subreddit
- Install nodejs
- Clone the repository locally (or a fork of it):
git clone git@github.com:Trekiros/BestOfReddit.git - In your local repository, install code dependencies:
npm i - Create a copy of
conf-template.yml, namedconf.yml, modifyingconf.ymlfor your specific use case. - Create a Google Sheet, and note its id (from its URI) in
conf.yml, under thespreadsheetIdfield - Enable Google Sheets API for your Google API, and download a
credentials.jsonfile following the instructions found here. Make sure you use the same Google Account which was used for the creation of the spreadsheet. - Write down the
client_id,project_idandclient_secretfields inconf.yml, using the values found incredentials.json. Never share or commit these credentials. They could be used to access and modify all of your Google Spreadsheets. - Create a Reddit script app here.
- Use this new Reddit app to complete the
appIdandappSecretfields of the reddit category inconf.yml. Never share or commit these credentials. They could be used to access your reddit account. - Complete the
usernameandpasswordfields in the reddit category ofconf.yml, using the credentials of the reddit account which has created the reddit app. If you do not wish to use your personal reddit account for this, the project can just as easily be ran using a new reddit account. - Run the project for the first time:
npm startornode . - On the first run, the project will ask you to follow a link to grant it authority over your Google Sheets file. Make sure you use the same Google Account which was used for the creation of the spreadsheet.
- It will then create a file named
token.jsonwhich lets it bypass the last step on subsequent runs. Never share or commit this file. It could be used to access and modify all of your Google Spreadsheets.
To contribute, fork this project, and make a pull request with your changes. I will then review the pull request, notably to ensure no changes are made which could compromise users' credentials.
This project could be deployed on any number of platforms. Heroku was chosen as the example because the projects is designed to be ran in a monthly cron job, and Heroku provides free options for this use case.
- Fork this project, and run it locally using the instructions found above. This ensures you have a
conf.ymland atoken.jsonfile, which will be needed to configure Heroku. - Create a Heroku account here
- Create a new app
- Create a production pipeline for your app
- In the
Resourcestab, add theHeroku SchedulerandLogentriesadd-ons to your pipeline - Configure
Heroku Schedulerto run the project periodically (command:npm run start). Ideally, this project is to be ran monthly, but Heroku Scheduler only goes up to daily. The project can be ran more often without issue (no duplicate months in the output spreadsheet, or errors), but this would be a waste of computing power. - Start following logs in real time in
Logentries, to ensure that things are working properly - In the
Settingstab, clickReveal Config Vars, and add the following environment variables:conf: copy your localconf.ymlfilegoogleToken: copy your localtoken.jsonfile
- In the
Deployment Methodtab, link the pipeline to your fork of the project. This should launch the project for the first time and automatically detect that it is a nodejs project. You can then re-run it manually from this same tab, but theHeroku Scheduleradd-on will update it periodically without your input.
- Use the Reddit API directly rather than pushshift