Skip to content

krishi-vb/mylot-article-scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mylot -> PDF Scraper

Inspired by this reddit post I wrote a scraper to save all of his mum's mylot posts to PDFs, including images and comments.

Apologies in advance for the messy code, I wrote it in about half an hour. Feel free to PR fix.

example

Running

Run in like so, supplying the username of the person your wish to archive.

node ./index.js ridingbet 

How it works

First is paginates all the articles to get a complete list. Once that's done, using 5 instances of puppeteer (headless browser) it creates PDF copies of each article. There's a little bit of trickery to get it looking good (hiding elements/etc).

The PDFs are named with the ISO timestamp first, so they sort nicely in the folder from oldest to newest

Licence

MIT, do you want want :)

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%