Skip to content

validatedpatterns-sandbox/rag-llm-sno

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG LLM Pattern on a Single-Node OpenShift Cluster

Overview

This Validated Pattern deploys a Retrieval-Augmented Generation (RAG) Large Language Model (LLM) infrastructure on a Single Node OpenShift (SNO) cluster. It provides a GPU-accelerated environment for running LLM inference services using vLLM with both IBM Granite 4 Small and GPT-OSS 120B models.

In addition to the LLM inference services, the pattern deploys Qdrant as a vector database, pre-populated with the Validated Patterns documentation. A frontend application is also included, allowing users to select an LLM, configure retrieval settings, and query the complete RAG pipeline.

flowchart LR
    User[User Query] --> Frontend
    Frontend --> Qdrant[(Vector DB)]
    Qdrant --> |Relevant Docs| Frontend
    Frontend --> LLM[LLM Service]
    LLM --> |Response| Frontend
    Frontend --> User
Loading

Applications & Components

LLM Inference Services

Vector Database

  • Qdrant - Vector database pre-populated with Validated Patterns documentation for retrieval

Frontend

  • RAG Frontend Application - Web interface for selecting an LLM, configuring retrieval settings, and querying the RAG pipeline

Supporting Operators

Prerequisites

  • OpenShift Cluster - Single Node OpenShift (SNO) deployment
  • GPU Hardware - NVIDIA GPU-enabled node with sufficient VRAM for LLM inference (at least 80GB to run the GPT-OSS 120B model)

This pattern was developed and tested on a Lenovo ThinkSystem SR650a V4 with 2 NVIDIA RTX Pro 6000 GPUs. If your hardware does not meet these requirements, you will need to modify this pattern accordingly.

Installation

Standard Installation

  1. Clone this repository:

    git clone https://github.com/validatedpatterns-sandbox/rag-llm-sno.git
    cd rag-llm-sno
  2. Log into your OpenShift cluster:

    export KUBECONFIG=/path/to/your/kubeconfig

    Or:

    oc login --token=<your-token> --server=<your-cluster-api>
  3. Install the pattern:

    ./pattern.sh make install

Custom Installation

If your hardware differs from the tested configuration or you need to modify the pattern:

  1. Fork this repository and clone your fork:

    git clone https://github.com/<your-username>/rag-llm-sno.git
    cd rag-llm-sno
  2. Create a branch for your changes:

    git checkout -b my-customizations
  3. Make your modifications (e.g., adjust model configurations, resource limits)

  4. Commit and push your changes:

    git add .
    git commit -m "Customize pattern for my environment"
    git push -u origin my-customizations
  5. Log into your OpenShift cluster:

    export KUBECONFIG=/path/to/your/kubeconfig

    Or:

    oc login --token=<your-token> --server=<your-cluster-api>
  6. Install the pattern:

    ./pattern.sh make install

Usage

After installation, access the pattern components from the OpenShift console's application menu (bento box):

OpenShift Application Menu

From here you can:

  • Cluster Argo CD / Prod ArgoCD - View the GitOps installation and sync status of the pattern
  • RAG LLM Demo UI - Launch the frontend application
  • Red Hat OpenShift AI - Access the RHOAI dashboard

Using the Frontend

The RAG LLM Demo UI provides an interface to query the RAG pipeline:

RAG Frontend

  1. Select an LLM - Choose between the available models (IBM Granite 4 Small or GPT-OSS 120B)
  2. Configure Retrieval Settings - Adjust search type (similarity, similarity_score_threshold, or mmr) and parameters like number of documents to retrieve
  3. Submit your query - Enter a question and view the response along with retrieved documents

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published