#Wikipedia Challenge - Find the shortest path between two Wikipedia pages
This application is inspired by Ran Levi's Wikipedia challenge in his famous podcast, Curious Minds, in which Ran asks the listeners to find the shortest path between two Wikipedia pages.
- Node JS 6.x (or higher).
- PostgreSQL DB v9.2 (or higher).
- Rabbit Message Queue (RMQ) v3.6.x (or higher).
- After successful installation, make sure Postgres and RMQ are running.
Once you have completed the previous step, you need to run the following commands from the source's root directory:
npm install
- installs all NPM packages specified inpackage.json
.npm run migrate up
- creates the database tables.
- DATABASE_URL - the URL to PostgreSQL DB. Default value:
postgres://postgres:@localhost/postgres
- AMQP_URL - the URL to RMQ. Default value:
amqp://guest:guest@localhost:5672
- LOG_LEVEL - log level. Values:
error
/warn
/info
/verbose
/debug
/silly
. Default value:info
Web crawler that downloads all Wikipedia pages and stores them on the disk for future processing.
From the source root directory, run:
npm run downloader
You may run several downloaders at the same time to speed up the download process.
Tool that processes the downloaded pages. The downloaded pages are stored on the local drive while a process job is pushed into the RMQ. The processor takes an item from RMQ and extracts all links from the page, then it will push more download jobs into RMQ to be processed by the downloaders.
From the source root directory, run:
npm run processor
You should NOT run multiple processors at the same time.
Once the downloaders
and the processor
are done (may take a few hours), the finder will use the indexed pages to find the shortest path from any given two Wikipedia pages using Breadth-first search (BFS) algorithm.
From the source root directory, run:
npm run finder <LINK-START> <LINK-END>
Where:
- <LINK-START> is the URL to the "start" Wikipedia page.
- <LINK-END> is the URL to the "finish" Wikipedia page.
Example:
npm run finder https://he.wikipedia.org/wiki/%D7%95/%D7%90%D7%95 https://he.wikipedia.org/wiki/Can%27t_Buy_Me_Love