This repository contains various implementations and assignments for Information Retrieval (IR) models that were taught by Dr. Khaldoon at UET in the Information Retrieval subject during Fall 2024. The code is organized into different assignments, each focusing on specific IR concepts and techniques.
This file implements a basic search engine that performs the following tasks:
- Indexing Documents: Reads and indexes documents from a specified folder. It cleans, tokenizes, and extracts nouns from the text.
- Search by Word: This feature allows users to search for a word in the indexed documents and displays its frequency and position in each document.
- Search by Document Title: Allows users to search for a document by its title.
- Interactive Menu: Provides an interactive menu for users to choose different search options and re-index documents if needed.
This file implements the following functionalities:
- Tokenize Nouns: Extract nouns from documents using heuristics optimized for technology-related text.
- TF-IDF Calculation: Calculates Term Frequency (TF), Inverse Document Frequency (IDF), and TF-IDF scores for search queries.
- Cosine Similarity: Computes cosine similarity scores to rank documents based on their relevance to the search query.
- Interactive Search: Provides an interactive search interface to query documents and display results in a well-formatted table.
Assignment 3/BIM.pyAssignment 3/pnm.pyAssignment 3/non_overlaped.py
These files implement different aspects of the Binary Independence Model (BIM) and document indexing:
- Noun Extraction and Tokenization: Extracts meaningful nouns from text content.
- Binary Vector Creation: Creates binary vectors for documents based on the presence of terms.
- Term Probability Calculation: Calculates term probabilities for BIM.
- Document Ranking: Ranks documents using BIM scores computed via the Dice Coefficient.
- Graph Representation: Builds a graph where documents and nouns are nodes, and edges connect documents to their nouns.
- Non-Overlapped List Model: Implements a linked list model for non-overlapping document lists and performs search and retrieval.
Assignment 4/sgb.pyAssignment 4/hypertext.py
These files implement graph-based models for organizing and retrieving information:
- Udemy Course Browser: Creates a GUI using Tkinter to browse Udemy-like course hierarchy with descriptions.
- Hypertext E-book Reader: Implements an e-book reader with hypertext navigation and search functionality using Tkinter.
This file implements an extended Boolean search system for e-commerce products:
- Product Data Loading: Loads product data from a CSV file.
- Boolean Query Processing: Processes Boolean queries with relational operators (AND, OR, NOT) and field-based conditions.
- Search Interface: Provides a GUI for entering search queries and displaying results using Tkinter.
This file implements a neural network-based document query system:
- Noun Extraction and Tokenization: Extracts nouns from documents and builds a vocabulary.
- Bag-of-Words Vector Creation: Creates Bag-of-Words (BoW) vectors for documents.
- Neural Network: Trains a simple neural network to rank documents based on their relevance to a search query.
- GUI for Query and Results: Provides a GUI for entering search queries and displaying top matching documents using Tkinter.
This file implements a belief network model for analyzing smartphone camera quality:
- Data Loading: Loads smartphone data from a CSV file.
- Bayesian Inference: Performs Bayesian inference to rank smartphones based on user-defined criteria.
- Belief Network Model: Uses a belief network model to rank smartphones based on their attributes and relevance to the query.
- GUI for Input and Results: Provides a GUI for entering user ratings and displaying ranked smartphones using Tkinter.










