Skip to content

KnowledgeKapture is an information retrieval system and search engine designed to enable users to efficiently search through PDF, Word, and TXT files and crawling them Using Python

Notifications You must be signed in to change notification settings

MohammadMoataz2/KnowledgeKapture

Repository files navigation

KnowledgeKapture: Information Retrieval System

Made by Masa Aladwan and Mohammad Moataz

Crawling And Search

Overview

KnowledgeKapture is an information retrieval system and search engine designed to enable users to efficiently search through PDF, Word, and TXT files and crawling them. Leveraging Natural Language Processing (NLP) techniques, it preprocesses both files and queries to enhance search accuracy and crawling Process. The system ranks search results using a similarity ranking method for effective information retrieval. Additionally, a user-friendly interface built with tkinter facilitates easy query input and result display.

image

Features

  • Versatile Search: Supports searching across diverse file formats including PDF, Word, and TXT.
  • NLP Preprocessing: Utilizes NLTK preprocessing techniques for both files and queries.
  • Similarity Ranking: Implements a similarity ranking method to deliver accurate and relevant search results.
  • User Interface: Developed with tkinter for a seamless interaction experience.
  • File Crawling: Conducts crawling to create an inverse index for efficient searching.

Achievements

  • Designed and implemented a versatile search engine for diverse file formats, enhancing information retrieval efficiency.
  • Leveraged NLTK preprocessing and similarity ranking methods to deliver accurate and relevant search results.
  • Developed an intuitive user interface for seamless interaction with the search engine.

Pipeline

image

The KnowledgeKapture system follows the following pipeline:

  1. Add File: Users can add files in PDF, Word, or TXT formats to the system.
  2. Crawling: The system conducts crawling to gather data from added files.
  3. Inverted Index: It then creates an inverted index from the crawled data for efficient searching.
  4. Query: Users input their query through the user-friendly interface.
  5. Search: The system searches through the inverted index using NLP techniques.
  6. Retrieve Document: Relevant documents matching the query are retrieved and displayed to the user.

Technologies Used

  • Python: Utilizes pandas and NumPy for data manipulation.
  • Natural Language Processing (NLP): Employs NLTK for preprocessing.
  • User Interface: tkinter for GUI development.
  • Crawling: Implements crawling techniques for creating an inverse index.
  • File Formats: Supports PDF, Word, and TXT.

image

Installation

  1. Clone the repository:

git clone https://github.com/MohammadMoataz2/KnowledgeKapture.git

  1. Install dependencies:

pip install -r requirements.txt

Usage

  1. Run the application:

python KnowledgeKapture.py

  1. Input your query in the provided interface.
  2. View the search results displayed.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

About

KnowledgeKapture is an information retrieval system and search engine designed to enable users to efficiently search through PDF, Word, and TXT files and crawling them Using Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published