Made by Masa Aladwan and Mohammad Moataz
KnowledgeKapture is an information retrieval system and search engine designed to enable users to efficiently search through PDF, Word, and TXT files and crawling them. Leveraging Natural Language Processing (NLP) techniques, it preprocesses both files and queries to enhance search accuracy and crawling Process. The system ranks search results using a similarity ranking method for effective information retrieval. Additionally, a user-friendly interface built with tkinter facilitates easy query input and result display.
- Versatile Search: Supports searching across diverse file formats including PDF, Word, and TXT.
- NLP Preprocessing: Utilizes NLTK preprocessing techniques for both files and queries.
- Similarity Ranking: Implements a similarity ranking method to deliver accurate and relevant search results.
- User Interface: Developed with tkinter for a seamless interaction experience.
- File Crawling: Conducts crawling to create an inverse index for efficient searching.
- Designed and implemented a versatile search engine for diverse file formats, enhancing information retrieval efficiency.
- Leveraged NLTK preprocessing and similarity ranking methods to deliver accurate and relevant search results.
- Developed an intuitive user interface for seamless interaction with the search engine.
The KnowledgeKapture system follows the following pipeline:
- Add File: Users can add files in PDF, Word, or TXT formats to the system.
- Crawling: The system conducts crawling to gather data from added files.
- Inverted Index: It then creates an inverted index from the crawled data for efficient searching.
- Query: Users input their query through the user-friendly interface.
- Search: The system searches through the inverted index using NLP techniques.
- Retrieve Document: Relevant documents matching the query are retrieved and displayed to the user.
- Python: Utilizes pandas and NumPy for data manipulation.
- Natural Language Processing (NLP): Employs NLTK for preprocessing.
- User Interface: tkinter for GUI development.
- Crawling: Implements crawling techniques for creating an inverse index.
- File Formats: Supports PDF, Word, and TXT.
- Clone the repository:
git clone https://github.com/MohammadMoataz2/KnowledgeKapture.git
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python KnowledgeKapture.py
- Input your query in the provided interface.
- View the search results displayed.
Contributions are welcome! Please fork the repository and submit a pull request with your changes.