I wrote a blog post about this work. It was discussed some at Hacker News.
This repo contains the artefacts about the Smoky Mountains Data Challenge 2018 that I solved (and won first prize). In the following, I describe the approach, method and some interesting tidbits.
SMC Data Challenge 4 Scientific Publications Mining
. To run the awk code:
awk -f prob2.awk stop_words.txt data_dir/*.txt
. To compile the Swift code:
stc runprob2.swift #will generate tic file
. To run Swift code:
turbine -n 340 runprob2.tic