Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can build a docker image? #5

Open
karthik18495 opened this issue Oct 2, 2024 · 0 comments
Open

Can build a docker image? #5

karthik18495 opened this issue Oct 2, 2024 · 0 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@karthik18495
Copy link
Member

Issue: Package RAG-Based Document Retriever in Docker

Description:
We need to package the entire Retrieval Augmented Generation (RAG)-based document retriever for the Electron-Ion Collider (EIC) in a Docker container to simplify setup, ensure a consistent environment, and make it easier for new contributors to start working on the project. This will come after #2

This issue outlines the steps needed to:

  1. Download all necessary resources (documents, embeddings, etc.).
  2. Set up a vector database (e.g., FAISS or Pinecone).
  3. Build the LangChain app within the container.
  4. Provide instructions for building and running the Docker image.

Objective:
To create a Docker image that packages the RAG-based document retriever system with all dependencies and resources, making it easy for collaborators to set up, run, and contribute to the project.

Tasks:

  1. Download Resources:

    • Write a script (e.g., download_resources.py) that:
      • Downloads any required documents (e.g., PDFs, datasets).
      • Preprocesses the documents for input into the RAG system.
      • Downloads pre-trained models and embeddings (if applicable).
    • Ensure that this script can be run as part of the Docker build process, downloading everything needed for the app to function.
  2. Set Up Vector Database:

    • Choose the vector database technology (e.g., FAISS, Pinecone).
    • Write setup instructions to integrate the vector database within the Docker container.
    • Ensure that the database is pre-configured with the downloaded resources so that new users can immediately run the system after building the Docker image.
    • If using FAISS:
      • Ensure it installs properly in the Docker container.
      • Create a script (setup_faiss.py) that adds the downloaded documents to the FAISS index.
    • If using Pinecone or another external service:
      • Include clear instructions for configuring the API keys and setting up the environment.
  3. Build LangChain App:

    • Write a langchain_app.py file that defines the LangChain-based RAG system.
      • Ensure the app connects to the vector database.
      • Create an endpoint (e.g., using FastAPI or Flask) that users can query to retrieve documents.
    • Ensure all dependencies are properly installed and configured.
  4. Dockerize the Application:

    • Create a Dockerfile that:
      • Sets up a Python environment.
      • Installs poetry and uses it to install dependencies from pyproject.toml.
      • Runs the download_resources.py and vector database setup scripts as part of the build process.
      • Exposes necessary ports for the LangChain app.
    • Example Dockerfile structure:
      FROM python:3.9-slim
      
      # Install dependencies
      RUN pip install poetry
      COPY pyproject.toml poetry.lock ./
      RUN poetry install
      
      # Add application code
      COPY . /app
      WORKDIR /app
      
      # Download resources and set up vector DB
      RUN poetry run python download_resources.py
      RUN poetry run python setup_faiss.py  # or setup_pinecone.py
      
      # Expose the necessary port for the LangChain app
      EXPOSE 8000
      
      # Run the LangChain app
      CMD ["poetry", "run", "python", "langchain_app.py"]
  5. Write Instructions for Building and Running the Docker Image:

    • In the README.md file, add detailed instructions for:
      • Building the Docker image.
      • Running the container and starting the LangChain app.
      • Making contributions (e.g., adding new documents, extending the vector database, etc.).

    Example:

    ## Running the RAG-Based Document Retriever
    
    1. **Build the Docker Image**:
       ```bash
       docker build -t rag-document-retriever .
    1. Run the Docker Container:

      docker run -p 8000:8000 rag-document-retriever
    2. Access the LangChain App:
      Once the container is running, you can access the document retriever by visiting http://localhost:8000.

    3. Contributing:

      • To add new documents, place them in the resources/ directory and re-run the Docker build process.
      • To modify the RAG system, edit the langchain_app.py and make a pull request.
    
    

Acceptance Criteria:

  • A functional Docker image that contains the entire RAG-based document retriever system.
  • The Docker image successfully sets up the vector database and downloads all necessary resources.
  • Clear and accurate documentation in the README.md for building and running the Docker image.
  • The LangChain app is accessible via a REST API after running the Docker container.

@karthik18495 karthik18495 added help wanted Extra attention is needed question Further information is requested labels Oct 2, 2024
@karthik18495 karthik18495 self-assigned this Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
Status: In Progress
Development

No branches or pull requests

1 participant