Skip to content

Attempt to generate thesaurus results based on Wikipedia as a dataset

License

Notifications You must be signed in to change notification settings

AjaniBilby/wiki-thesaurus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wiki Thesaurus

This CLI tool attempts to get similar words to any given input based on the graph structure of how Wikipedia pages are linked together.

Installation

Using pre-built

Clone this repository then extract the contents of simplewiki.rar into the ./data/ folder.

Building from source

Downloads and depacks the dataset of all english wikipedia articles

python download.py

Generate the article graph as a database

python ingest.py

Use

Start-up

python search.py

Enter any text to perform a search, and start with a . to perform a command

Command Action
.next Will show the next x results from the previous search
.algo xxx Changes the search algorithm used, with the name corresponding to any algorithm in the folder ./algorithm/ (i.e. .algo intersection )
.limit xxx Will change the number of results shown per search based on the number used in place of the xxx (i.e. .limit 20)
.exit Ends the program

About

Attempt to generate thesaurus results based on Wikipedia as a dataset

Resources

License

Stars

Watchers

Forks

Packages

No packages published