Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
d36c51f
create README and project structure
alexsifman Aug 4, 2025
4c8be73
add agentic files and update for audio rag
alexsifman Aug 5, 2025
402b84a
add states
alexsifman Aug 6, 2025
92778f0
updates to caching
alexsifman Aug 6, 2025
fc5f72f
update utils functions
alexsifman Aug 6, 2025
7ff66f5
Merge branch 'main' of https://github.com/HPInc/AI-Blueprints into fe…
alexsifman Aug 6, 2025
79a8cc1
create configs file
alexsifman Aug 6, 2025
c05ddd4
create workflow notebook
alexsifman Aug 6, 2025
7d64b71
create requirements file
alexsifman Aug 6, 2025
5fae52e
add more libraries
alexsifman Aug 6, 2025
9c439c1
update and add util functions
alexsifman Aug 6, 2025
71e6168
updates to README
alexsifman Aug 7, 2025
9f6f88e
include new function in utils
alexsifman Aug 7, 2025
5e80537
include audio model download from HF
alexsifman Aug 7, 2025
b24894c
update libraries
alexsifman Aug 9, 2025
51f069a
include more helper func
alexsifman Aug 9, 2025
e239057
create model selection class
alexsifman Aug 9, 2025
d923ba3
include audio model download
alexsifman Aug 9, 2025
929f86f
Merge branch 'main' of https://github.com/HPInc/AI-Blueprints into fe…
alexsifman Aug 9, 2025
8f3c212
create register model notebook
Alejandro-Sifuentes-Manjarrez Aug 11, 2025
3b6d4b5
update libraries
alexsifman Aug 12, 2025
3251c3b
include a temp qwen model notebook
alexsifman Aug 13, 2025
47e9f28
updates
alexsifman Aug 13, 2025
aaf71a1
library updates
alexsifman Aug 13, 2025
ab85a63
update resampling
alexsifman Aug 13, 2025
49a422d
update workflow nb
alexsifman Aug 14, 2025
1c240e1
update nodes and caching
alexsifman Aug 14, 2025
7d561f7
modify output display
alexsifman Aug 15, 2025
26e3cb9
Update readme
alexsifman Aug 15, 2025
f77b348
Merge branch 'main' of https://github.com/HPInc/AI-Blueprints into fe…
alexsifman Aug 15, 2025
04ef5a2
update workflow model
alexsifman Aug 15, 2025
4b739ea
Merge branch 'feat/agentic-audio-rag' of https://github.com/HPInc/AI-…
alexsifman Aug 15, 2025
8957d1e
update prompt
alexsifman Aug 15, 2025
315a981
updated langgraph nodes
alexsifman Aug 16, 2025
da34868
update caching step
alexsifman Aug 17, 2025
9aaea42
modify cached output
alexsifman Aug 17, 2025
e65e64c
update input files and prompts
alexsifman Aug 17, 2025
61991c6
remove unsued library
alexsifman Aug 17, 2025
bc1364a
updates to register notebook
alexsifman Aug 18, 2025
0bff56e
fix model output
alexsifman Aug 18, 2025
0d18394
update model prompt
alexsifman Aug 18, 2025
56b69e8
fix error with thresholds
alexsifman Aug 18, 2025
07257eb
revert changes
alexsifman Aug 18, 2025
64e43aa
revert changes
alexsifman Aug 18, 2025
60f6675
test cell output
alexsifman Aug 18, 2025
dd2b704
register notebook updates
alexsifman Aug 19, 2025
f3b2233
register notebook updates
alexsifman Aug 19, 2025
9bbc623
reduce memory use
alexsifman Aug 19, 2025
fe78ce7
update model devices
alexsifman Aug 19, 2025
b6fa78a
refactor kvmemory impl
alexsifman Aug 20, 2025
0de47da
refactor audio generation
alexsifman Aug 20, 2025
fe2d914
support video media files
alexsifman Aug 20, 2025
51380b5
refactor model loading
alexsifman Aug 20, 2025
a0f1e7c
refactor audio segments
alexsifman Aug 20, 2025
8d5922a
fix missing parameters
alexsifman Aug 21, 2025
ff445ad
remove extra logging
alexsifman Aug 21, 2025
045a372
update workflow perf
alexsifman Aug 21, 2025
8e3f684
update model output
alexsifman Aug 22, 2025
98ee129
update register notebook
alexsifman Aug 22, 2025
7875bfd
update register notebook
alexsifman Aug 22, 2025
b6c86b4
update graph
alexsifman Aug 22, 2025
50d503f
include audio class
alexsifman Aug 22, 2025
08a4387
update notebooks to remove echoes
alexsifman Aug 22, 2025
406e40c
trim outputs
alexsifman Aug 22, 2025
ce4a1d9
cleaner model output
alexsifman Aug 22, 2025
08ffdcc
Merge branch 'main' into feat/agentic-audio-rag
ata-turhan Aug 22, 2025
54d38d5
executed workflow notebook
alexsifman Aug 22, 2025
914be58
Merge branch 'feat/agentic-audio-rag' of https://github.com/HPInc/AI-…
alexsifman Aug 22, 2025
ebefcef
update register notebook
alexsifman Aug 22, 2025
a99e113
streamlit ui skeleton
alexsifman Aug 22, 2025
6458b56
fix errors
alexsifman Aug 24, 2025
fdc6211
upgrade mlflow ver
alexsifman Aug 24, 2025
1ba2624
update register model nb
alexsifman Aug 25, 2025
de79272
fix serialization error
alexsifman Aug 26, 2025
070042a
Merge branch 'main' into feat/agentic-audio-rag
ata-turhan Aug 26, 2025
99bd177
Merge branch 'main' into feat/agentic-audio-rag
ata-turhan Aug 26, 2025
dcd3c2b
update mlflow pathing
alexsifman Aug 28, 2025
e5b2831
Merge branch 'feat/agentic-audio-rag' of https://github.com/HPInc/AI-…
alexsifman Aug 28, 2025
058e480
update register notebook
alexsifman Aug 28, 2025
3deb4f3
update data schema
alexsifman Aug 28, 2025
ade2149
remove unused libraries
alexsifman Aug 28, 2025
7824c8a
Merge branch 'main' into feat/agentic-audio-rag
ata-turhan Aug 28, 2025
4772e51
remove unused code blocks, cells and comments
alexsifman Sep 9, 2025
26e7d71
remove unused functions and files
alexsifman Sep 12, 2025
0f5b56f
update directory structure
alexsifman Sep 13, 2025
9e571bc
Merge branch 'main' into feat/agentic-audio-rag
ata-turhan Sep 13, 2025
a2647c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
205 changes: 205 additions & 0 deletions generative-ai/agentic-audio-rag-with-langgraph/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# 🤖 Agentic Audio RAG with LangGraph

<div align="center">

![Python](https://img.shields.io/badge/Python-3.11+-blue.svg?logo=python)
![MLflow](https://img.shields.io/badge/MLflow-Model_Deployment-orange.svg?logo=mlflow)
![Streamlit](https://img.shields.io/badge/Streamlit-Frontend_App-ff4b4b.svg?logo=streamlit)
![LangGraph](https://img.shields.io/badge/LangGraph-Agentic_Workflow-blue.svg?logo=langchain)
![LangChain](https://img.shields.io/badge/LangChain-LLM_Orchestration-lightgreen.svg?logo=langchain)

</div>

---

## 📚 Contents

* [🧠 Overview](#🧠-overview)
* [📁 Project Structure](#📁-project-structure)
* [⚙️ Setup](#⚙️-setup)
* [🚀 Usage](#🚀-usage)
* [📞 Contact & Support](#📞-contact--support)

---

## 🧠 Overview

The **Agentic Audio RAG** blueprint turns speech in audio/video files into **searchable knowledge** and lets you ask questions directly about the **actual audio** (not just text). A LangGraph-driven agent retrieves the most relevant **timestamped audio segments**, and an audio-native LLM (Qwen Omni) “listens” to those clips to produce precise answers.

It delivers:

* 🎧 **Audio-native LLM QA** — the model consumes selected audio windows directly for reasoning (supports MP3, WAV, OGG, FLAC, and audio tracks from MP4, MOV, MKV, AVI, …).
* 🔊 **Audio embedding with CLAP** — builds a segment-level vector index over audio; retrieve by embedding the user’s text query into the **same audio↔text space**.
* 🧪 **Agentic RAG orchestration via LangGraph** — retrieval → (optional rerank) → generation → memory, all modular and node-based.
* 🦙 **Llama.cpp** for fast, local text LLM utilities (e.g., lightweight reranking/scoring or text-only reasoning when needed).
* 📚 **Audio-aware vector database (FAISS)** — stores CLAP embeddings for efficient semantic search over timestamped segments.
* 🧬 **Reranking stage** to sharpen selection (MMR diversification and/or lightweight LLM scoring).
* 🕒 **Evidence with timestamps** — answers highlight the exact audio spans (start/end seconds) used to support the response.
* 💾 **Disk-backed memory cache** — stores recent Q&A pairs to accelerate repeat queries.
* 📦 **MLflow integration** — experiment tracking and model packaging aligned with the agentic-feedback-analyzer blueprint.
* 🌐 **Streamlit UI** — upload media, run queries, and inspect highlighted evidence.

---

## 📁 Project Structure

```bash
agentic-audio-rag-with-langgraph/
├── configs/ # Configuration files
│ └── config.yaml # Blueprint configuration (UI mode, ports, service settings)
├── data/ # Sample media files (input directory)
│ └── inputs/ # └─ *.mp3 / *.wav / *.mp4 …
├── demo/ # UI frontend code (Streamlit)
│ └── streamlit/
├── docs/ # UI documentation & screenshots
│ ├── Streamlit UI Page - Agentic Audio RAG.pdf
│ └── streamlit-ui-ss-agentic-audio-rag.png
├── notebooks/ # Workflow and MLflow notebooks
│ ├── register-model.ipynb
│ └── run-workflow.ipynb
├── src/ # Core LangGraph modules
| ├── __init__.py
| ├── audio_rag_model.py # MLflow PyFunc model class
| ├── audio_rag_nodes.py # LangGraph nodes
| ├── agentic_state.py # Shared state schema
| ├── agentic_workflow.py # LangGraph DAG construction
| ├── simple_kv_memory.py # Disk-based memory module
| └── utils.py # Helper functions
├── requirements.txt # All required packages
└── README.md # Project documentation

```

---

## Configuration

The blueprint uses a centralized configuration system through `configs/config.yaml`:

```yaml
ui:
mode: streamlit # UI mode: streamlit or static
ports:
external: 8501 # External port for UI access
internal: 8501 # Internal container port
service:
timeout: 30 # Service timeout in seconds
health_check_interval: 5 # Health check interval in seconds
max_retries: 3 # Maximum retry attempts
```

---

## ⚙️ Setup

### Step 0: Minimum Hardware Requirements

* ✅ **GPU**: NVIDIA GPU with 12 GB+ VRAM (recommended for LLM acceleration)
* ✅ **RAM**: 32–64 GB system memory
* ✅ **Disk**: ≥ 10 GB free space

### Step 1: Create an AI Studio Project

1. Go to [HP AI Studio](https://hp.com/ai-studio) and create a new project.
2. Use the base image: `Local GenAI`

### Step 2: Clone the Repository

1. Clone the GitHub repository:

```
git clone https://github.com/HPInc/AI-Blueprints.git
```

2. Ensure all files are available after workspace creation.

## Step 3: Configure Secrets

- **Configure Secrets in YAML file (Freemium users):**
- Create a `secrets.yaml` file in the `configs` folder and list your API keys there:
- `HUGGINGFACE_API_KEY`: Required to use Hugging Face-hosted models instead of a local LLaMA model.

- **Configure Secrets in Secrets Manager (Premium users):**
- Add your API keys to the project's Secrets Manager vault, located in the `Project Setup` tab -> `Setup` -> `Project Secrets`:
- `HUGGINGFACE_API_KEY`: Required to use Hugging Face-hosted models instead of a local LLaMA model.
- In `Secrets Name` field add: `HUGGINGFACE_API_KEY`
- In the `Secret Value` field, paste your corresponding key generated by HuggingFace.

<br>

**Note: If both options (YAML option and Secrets Manager) are used, the Secrets Manager option will override the YAML option.**

### Step 4: Setup Configuration

- Edit `config.yaml` with relevant configuration details:
- `model_source`: Choose between `local`, `hugging-face-cloud`, or `hugging-face-local`
- `ui.mode`: Set UI mode to `streamlit` or `static`
- `ports`: Configure external and internal port mappings
- `service`: Adjust MLflow timeout and health check settings
- `proxy`: Set proxy settings if needed for restricted networks

---

## 🚀 Usage

### 🧪 Step 1: Run LangGraph Workflow

Use the provided notebook to run the end-to-end pipeline:

```bash
notebooks/run-workflow.ipynb
```

This notebook will:

* Scan data/inputs for audio/video, normalize audio, and segment into timestamped windows
* Build a true audio embedding index over segments using CLAP (audio↔text joint space)
* Run the agentic retrieval-and-rerank workflow, sending the top audio windows to the model to listen and answer directly
* Show the generated answers together with the highlighted transcript segments and timestamps

### 🧠 Step 2: Register Model with MLflow

Log and serve the full pipeline as an MLflow `pyfunc` model:

```bash
notebooks/register-model.ipynb
```
This notebook will:

* Packages the complete **Agentic Audio RAG** workflow (vector store, reranker, LangGraph DAG, memory module) as a single MLflow artifact
* Registers the model to MLflow so it can be queried over HTTP

### 📦 Step 3: Deploy the Service

- Go to **Deployments > New Service** in AI Studio.
- Name the service and select the registered model.
- Choose a model version and enable **GPU acceleration**.
- Start the deployment.
- Once deployed, access the **Streamlit UI** via the Service URL.
- The service will automatically use the configuration logged as an artifact during model registration.

### 🌐 Step 4: Launch Streamlit UI

This web UI will allow the user to:

* Upload one or more audio / video files (or pick the samples in `data/inputs/`)
* Ask questions about their content
* See the **highlighted transcript segments** (with timestamps) that the model used to craft each answer
* Benefit from the built-in memory: repeated queries return quickly after the first run
* Connect to a local MLflow model endpoint

---

## 📞 Contact & Support

- **Troubleshooting:** Refer to the [**Troubleshooting**](https://github.com/HPInc/AI-Blueprints/tree/main?tab=readme-ov-file#troubleshooting) section of the main README in our public AI-Blueprints GitHub repo for solutions to common issues.

- **Issues & Bugs:** Open a new issue in our [**AI-Blueprints GitHub repo**](https://github.com/HPInc/AI-Blueprints).

- **Docs:** [**AI Studio Documentation**](https://zdocs.datascience.hp.com/docs/aistudio/overview).

- **Community:** Join the [**HP AI Creator Community**](https://community.datascience.hp.com/) for questions and help.

---

> Built with ❤️ using [**HP AI Studio**](https://hp.com/ai-studio)
37 changes: 37 additions & 0 deletions generative-ai/agentic-audio-rag-with-langgraph/configs/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Blueprint Configuration
# This file configures the UI mode and ports for the model service

# model_source can be one of the following: local, hugging-face-cloud, or hugging-face-local
model_source: "hugging-face-cloud"

# Proxy is used to set the HTTPS_PROXY environment variable when necessary.
# For example, if you need to access external services from a restricted network,
# you should specify the proxy in this config.yaml file.
# proxy: "http://web-proxy.austin.hp.com:8080"

# UI Configuration
ui:
# UI mode: gradio, streamlit, or static
mode: "streamlit"

# Port Configuration
ports:
# External port exposed by Envoy proxy
external: 5000

# Internal port mappings for different UI types
internal:
gradio: 7860
streamlit: 8501
static: 5001

# Service Configuration
service:
# MLflow model server timeout (seconds)
mlflow_timeout: 600

# Health check timeout for service startup (seconds)
health_check_timeout: 600

# Number of health check retries
health_check_retries: 5
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# How to Successfully Use the Streamlit Web App

## 1. Install Required Versions
Ensure that the following are installed on your machine:
- **Python** version **≥ 3.11** (https://www.python.org/downloads/)
- **Poetry** version **≥ 2.0.0 and < 3.0.0** (https://python-poetry.org/docs/)

## 2. Set Up the Virtual Environment and Install Dependencies
Navigate to the project's root directory and run the following command to set up a virtual environment using Poetry and install all required packages:
```bash
python -m poetry install
```

## 3. Launch the Streamlit Web App
Still in the project's root directory, start the Streamlit app by running:
```bash
python -m poetry run streamlit run "main.py"
```

## 4. Select the Correct API Endpoint When Using the App
When interacting with the app:
- **Choose the exact and correct API URL** to connect to your deployed model.
- **Important:** The MLflow endpoint **must** use **HTTPS** (not HTTP).
- **Note:** In **Z by HP AI Studio**, the **port number** for your MLflow API **changes with each deployment**, so always verify the correct URL and port before starting a session.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/* Main background with a clean blue gradient */
body {
background: linear-gradient(135deg, #e0f2ff, #a1c4fd, #70a1ff, #4a69bd);
background-attachment: fixed;
background-size: 400% 400%;
animation: gradientShift 20s ease infinite;
color: #000000;
font-family: 'Segoe UI', sans-serif;
font-size: 16px;
}

/* Animate the gradient background */
@keyframes gradientShift {
0% { background-position: 0% 50%; }
50% { background-position: 100% 50%; }
100% { background-position: 0% 50%; }
}

.main {
font-size: 16px;
}
.gradient-header {
background: linear-gradient(90deg, #005AA7, #FFFDE4);
color: black;
padding: 1rem;
text-align: center;
border-radius: 0.5rem;
margin-bottom: 2rem;
}
.result-box {
background-color: #f4f4f4;
border-left: 5px solid #005AA7;
padding: 1rem;
margin-top: 1rem;
border-radius: 0.5rem;
}
.logo-bar {
display: flex;
justify-content: space-evenly;
align-items: center;
margin-bottom: 2rem;
}
.logo-bar img {
max-height: 60px;
}
Loading