This Validated Pattern deploys a Retrieval-Augmented Generation (RAG) Large Language Model (LLM) infrastructure on a Single Node OpenShift (SNO) cluster. It provides a GPU-accelerated environment for running LLM inference services using vLLM with both IBM Granite 4 Small and GPT-OSS 120B models.
In addition to the LLM inference services, the pattern deploys Qdrant as a vector database, pre-populated with the Validated Patterns documentation. A frontend application is also included, allowing users to select an LLM, configure retrieval settings, and query the complete RAG pipeline.
flowchart LR
User[User Query] --> Frontend
Frontend --> Qdrant[(Vector DB)]
Qdrant --> |Relevant Docs| Frontend
Frontend --> LLM[LLM Service]
LLM --> |Response| Frontend
Frontend --> User
- IBM Granite 4 Small - Served via vLLM with GPU acceleration
- GPT-OSS 120B - Served via vLLM with GPU acceleration
- Qdrant - Vector database pre-populated with Validated Patterns documentation for retrieval
- RAG Frontend Application - Web interface for selecting an LLM, configuring retrieval settings, and querying the RAG pipeline
- Red Hat OpenShift AI (RHOAI) - AI/ML platform for model serving and management
- NVIDIA GPU Operator - Provides GPU support for the inference services
- Node Feature Discovery (NFD) - Identifies node hardware capabilities
- Local Volume Management Service (LVMS) - Manages local storage volumes
- OpenShift Cluster - Single Node OpenShift (SNO) deployment
- GPU Hardware - NVIDIA GPU-enabled node with sufficient VRAM for LLM inference (at least 80GB to run the GPT-OSS 120B model)
This pattern was developed and tested on a Lenovo ThinkSystem SR650a V4 with 2 NVIDIA RTX Pro 6000 GPUs. If your hardware does not meet these requirements, you will need to modify this pattern accordingly.
-
Clone this repository:
git clone https://github.com/validatedpatterns-sandbox/rag-llm-sno.git cd rag-llm-sno -
Log into your OpenShift cluster:
export KUBECONFIG=/path/to/your/kubeconfigOr:
oc login --token=<your-token> --server=<your-cluster-api>
-
Install the pattern:
./pattern.sh make install
If your hardware differs from the tested configuration or you need to modify the pattern:
-
Fork this repository and clone your fork:
git clone https://github.com/<your-username>/rag-llm-sno.git cd rag-llm-sno
-
Create a branch for your changes:
git checkout -b my-customizations
-
Make your modifications (e.g., adjust model configurations, resource limits)
-
Commit and push your changes:
git add . git commit -m "Customize pattern for my environment" git push -u origin my-customizations
-
Log into your OpenShift cluster:
export KUBECONFIG=/path/to/your/kubeconfigOr:
oc login --token=<your-token> --server=<your-cluster-api>
-
Install the pattern:
./pattern.sh make install
After installation, access the pattern components from the OpenShift console's application menu (bento box):
From here you can:
- Cluster Argo CD / Prod ArgoCD - View the GitOps installation and sync status of the pattern
- RAG LLM Demo UI - Launch the frontend application
- Red Hat OpenShift AI - Access the RHOAI dashboard
The RAG LLM Demo UI provides an interface to query the RAG pipeline:
- Select an LLM - Choose between the available models (IBM Granite 4 Small or GPT-OSS 120B)
- Configure Retrieval Settings - Adjust search type (similarity, similarity_score_threshold, or mmr) and parameters like number of documents to retrieve
- Submit your query - Enter a question and view the response along with retrieved documents

