Skip to content

Information retrieval system based on the word embedding technique (word2vec)

Notifications You must be signed in to change notification settings

zakaria-aabbou/NLP_based_information_retrieval_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

NLP_based_information_retrieval_system

The information retrieval methods are needed to find the most relevant documents to a given query. The words contained in the web pages can be modeled using different approaches such as Boolean models, vector space models, and probabilistic models. In this project, we have decided to use the vector space models and particularly the Doc2Vec (or word2vec) technique.

This project aims at developing an Information Retrieval System based on the word embedding technique "Doc2Vec (or word2vec)". The documents and the query will be represented by embedding vectors. The similarity between the query vector and each document will be computed using cosine similarity measure. Furthermore, to measure the effectiveness of this information retrieval system, we used the TREC test collection (dataset) available on this website: https://trec.nist.gov/data.html

About

Information retrieval system based on the word embedding technique (word2vec)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published