KnowledgeKapture: Information Retrieval System

Made by Masa Aladwan and Mohammad Moataz

Crawling And Search

Overview

KnowledgeKapture is an information retrieval system and search engine designed to enable users to efficiently search through PDF, Word, and TXT files and crawling them. Leveraging Natural Language Processing (NLP) techniques, it preprocesses both files and queries to enhance search accuracy and crawling Process. The system ranks search results using a similarity ranking method for effective information retrieval. Additionally, a user-friendly interface built with tkinter facilitates easy query input and result display.

Features

Versatile Search: Supports searching across diverse file formats including PDF, Word, and TXT.
NLP Preprocessing: Utilizes NLTK preprocessing techniques for both files and queries.
Similarity Ranking: Implements a similarity ranking method to deliver accurate and relevant search results.
User Interface: Developed with tkinter for a seamless interaction experience.
File Crawling: Conducts crawling to create an inverse index for efficient searching.

Achievements

Designed and implemented a versatile search engine for diverse file formats, enhancing information retrieval efficiency.
Leveraged NLTK preprocessing and similarity ranking methods to deliver accurate and relevant search results.
Developed an intuitive user interface for seamless interaction with the search engine.

Pipeline

The KnowledgeKapture system follows the following pipeline:

Add File: Users can add files in PDF, Word, or TXT formats to the system.
Crawling: The system conducts crawling to gather data from added files.
Inverted Index: It then creates an inverted index from the crawled data for efficient searching.
Query: Users input their query through the user-friendly interface.
Search: The system searches through the inverted index using NLP techniques.
Retrieve Document: Relevant documents matching the query are retrieved and displayed to the user.

Technologies Used

Python: Utilizes pandas and NumPy for data manipulation.
Natural Language Processing (NLP): Employs NLTK for preprocessing.
User Interface: tkinter for GUI development.
Crawling: Implements crawling techniques for creating an inverse index.
File Formats: Supports PDF, Word, and TXT.

Installation

Clone the repository:

git clone https://github.com/MohammadMoataz2/KnowledgeKapture.git

Install dependencies:

pip install -r requirements.txt

Usage

Run the application:

python KnowledgeKapture.py

Input your query in the provided interface.
View the search results displayed.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
Data		Data
KK _APP_Code/KnowledgeKapture		KK _APP_Code/KnowledgeKapture
KK _APP_EXE/KnowledgeKapture		KK _APP_EXE/KnowledgeKapture
KnowledgeKapture Older Versions		KnowledgeKapture Older Versions
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KnowledgeKapture: Information Retrieval System

Crawling And Search

Overview

Features

Achievements

Pipeline

Technologies Used

Installation

Usage

Contributing

About

Releases

Packages

Languages

MohammadMoataz2/KnowledgeKapture

Folders and files

Latest commit

History

Repository files navigation

KnowledgeKapture: Information Retrieval System

Crawling And Search

Overview

Features

Achievements

Pipeline

Technologies Used

Installation

Usage

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages