The information retrieval methods are needed to find the most relevant documents to a given query. The words contained in the web pages can be modeled using different approaches such as Boolean models, vector space models, and probabilistic models. In this project, we have decided to use the vector space models and particularly the Doc2Vec (or word2vec) technique.
This project aims at developing an Information Retrieval System based on the word embedding technique "Doc2Vec (or word2vec)". The documents and the query will be represented by embedding vectors. The similarity between the query vector and each document will be computed using cosine similarity measure. Furthermore, to measure the effectiveness of this information retrieval system, we used the TREC test collection (dataset) available on this website: https://trec.nist.gov/data.html