This is a set of scripts for creating nice preview page (see here: http://cs.stanford.edu/~karpathy/nipspreview/ ) for all papers published at NIPS. I hope these scripts can be useful to others to create similar pages for other conferences. They show how one can manipulate PDFs, extract image thumbnails, analyze word frequencies, etc.
-
Clone this repository to $FOLDER
git clone https://github.com/karpathy/nipspreview.git
-
Download nips25offline from
http://books.nips.cc/nips25.html
and move it into $FOLDER. -
Install ImageMagick:
sudo apt-get install imagemagick
-
Run
pdftowordcloud.py
(to generate top words for each paper. Output saved in topwords.p as pickle) -
Run
pdftothumbs.py
(to generate tiny thumbnails for all papers. Outputs saved in thumbs/ folder) -
Run
scrape.py
(to generate paperid, title, authors list by scraping NIPS .html page) -
Run
makecorpus.py
(to create allpapers.txt file that has all papers one per row) -
Run
python lda.py -f allpapers.txt -k 7 --alpha=0.5 --beta=0.5 -i 100
. This will generate a pickle file calledldaphi.p
that contains the LDA word distribution matrix. Thanks to this nice LDA code by shuyo! It requires nltk library and numpy. In this example we are using 7 categories. You would need to change thenipsnice_template.html
file a bit if you wanted to try different number of categories. -
Finally, run
generatenicelda.py
(to create the nipsnice.html page)
WTFPL licence