Skip to content

lucwl/wikitrace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikitrace

NLP applied to Wikipedia. DurHack 2025 submission.

Visualisation of the "Find Links" program

Find Links

Connects any two Wikipedia articles using pretained vector embeddings.

Euclidian distance and cosine similarity is used as a heuristic for A* search.

WikiCraft

Interface for adding and subtracting article embeddings in order to find related articles.

Reference Similarity

Tool that checks if article references match the content of in text citations using RAG and Gemini.

Running the backend

Make a .env file in the wikitrace directory and fill out the following fields:

wikitrace/.env

GEMINI_API_KEY=[insert key]
USER_AGENT=[insert user agent]

Then, either download a pretained model from the Wikipedia2Vec website or train one yourself.

English 100d (bin) is recommended.

Copy the model file to wikitrace/model.pkl.

Dependencies are managed with uv on the backend:

backend/

$ uv install

To run the backend:

backend/

$ fastapi dev main.py

Running the frontend

Make a .env file in the frontend directory and fill out the following fields:

frontend/.env

PUBLIC_GEMINI_API_KEY=[insert key]

Dependencies are managed with npm on the frontend:

frontend/

$ npm install

To run the frontend:

frontend/

$ npm run dev

About

Building paths between Wikipedia articles with NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •