git repository for CS7IS3 - Information Retrieval and Web Search in TCD. This project envolves a team effort in designing and implementing a Java project using the Apache Lucene library to search a large corpus of documents provided by the lecturer.
The data set contains files from:
- The Financial Times Limited (1991, 1992, 1993, 1994)
- The Federal Register (1994)
- The Foreign Broadcast Information Service (1996)
- The Los Angeles Times (1989, 1990).
- Install Java 1.8, Git, Maven, lucene and trec_eval
- Clone project
- Compile project with: mvn clean install source:jar
- Edit the run.sh script to select similarity and analyser models.
- Execute the run.sh to generate results for the search engine - Results are stored in: DataSet/queryResults
- Execute run_trec_eval.sh to generate MAP, Recall and other such metrics for the search engine. Awating qrels file for project evaluation.