Information-Retrieval

Working on the 800,000 news files dataset, an information retrieval system. Among the thousands of files obtaining information is a very tedious task if one has to go through each and every word from every file. This can be solved using an efficient information retrieval system. Using several techniques like removing stop words, punctuations, lower case and stemming the data was first pre-processed and cleaned for use.

A posting index was created on this data. With the word being the key which maps to a list. The first element of the list being the count of the word, second being another dictionary with each file it occurs in as the key with the word positions in the file as the values.

Using this posting index created, boolean retrieval was performed on the data.

Positional retrieval was performed.

Wild cary query.

Using the posting index created before a bi-word index was made and used for bi word query retrieval.

Retrieval using Similarity Index with Vector Space Model

Likelihood Model using Bayes theorem

Assigned tf-idf scores based on the input.

Obtained 0.9735667696532784 (97.35%) precision.

Relevance Feedback and reranking of results.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
AIWR.ipynb		AIWR.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information-Retrieval

About

Releases

Packages

Languages

UtkarshBagaria/Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Information-Retrieval

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages