Skip to content

This project integrates RAG techniques with GPT-2 for advanced question-answering using BBC news articles. It employs FAISS for efficient document retrieval and SentenceTransformer for embeddings, providing detailed and contextually accurate answers by combining article content with publication dates.

Notifications You must be signed in to change notification settings

Ayeshaaaaaaaaa/Enhanced-Contextual-Insights-RAG-Based-GPT-Integration-for-Question-Answering-with-BBC-News-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Enhanced-Contextual-Insights-RAG-Based-GPT-Integration-for-Question-Answering-with-BBC-News-Data

This project combines Retrieval-Augmented Generation (RAG) techniques with GPT-2 to provide advanced question-answering capabilities using a dataset of BBC news articles. By leveraging FAISS for efficient similarity search and SentenceTransformer for embedding generation, the system retrieves the most relevant articles based on user queries. The retrieved context is then used to generate detailed and contextually accurate answers with GPT-2. This approach enhances the quality of responses by integrating article content with publication dates, ensuring informative and relevant answers.

Key Features

  • RAG-Based Retrieval: Utilizes FAISS and SentenceTransformer for effective document retrieval.
  • Dynamic Question Answering: Employs GPT-2 for generating detailed and contextually accurate responses.
  • Comprehensive Context: Includes article content and publication dates for richer answers.
  • Efficient Processing: Optimized for GPU with PyTorch and Hugging Face's transformers.
  • Fine-Tuned Embeddings: Captures nuanced semantic meanings of the text.
  • Contextual Relevance: Ensures responses are highly relevant and informative.
  • Customizable Prompting: Tailors responses based on different query types and contexts.
  • Multi-Beam Search: Uses beam search in GPT-2 for diverse and high-quality text outputs.
  • Real-Time Interaction: Supports real-time question-answering for interactive applications.

Workflow

  1. Data Preparation:

    • Load Data: Read the dataset of BBC news articles into a pandas DataFrame.
    • Combine Text: Merge article titles, texts, and publication dates into a single text string for each article.
  2. Embedding Generation:

    • Generate Embeddings: Use SentenceTransformer to create dense vector embeddings for each article, leveraging GPU if available.
  3. Index Creation:

    • Create FAISS Index: Build an index for efficient similarity search with FAISS. Add embeddings to the index for quick retrieval of similar documents.
  4. Document Storage:

    • Create Document Objects: Store articles as Document objects with metadata, such as publication dates, in a FAISS vector store.
  5. Model Loading:

    • Load GPT-2: Initialize GPT-2 for text generation. Load the pre-trained model and tokenizer, configuring for GPU processing if available.
  6. Answer Generation:

    • Retrieve Relevant Documents: Use FAISS to get the top relevant documents based on a user query.
    • Generate Context: Combine retrieved documents into a structured context, including publication dates and excerpts.
    • Generate Response: Generate a detailed and accurate answer using GPT-2 based on the provided context.
  7. Output:

    • Display Answer: Present the generated answer to the user, incorporating relevant article content and publication dates for comprehensive responses.

About

This project integrates RAG techniques with GPT-2 for advanced question-answering using BBC news articles. It employs FAISS for efficient document retrieval and SentenceTransformer for embeddings, providing detailed and contextually accurate answers by combining article content with publication dates.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published