RAG Framework with Bi-Directional PII Masking
This project is a Secure AI RAG Framework designed to accelerate technical onboarding for complex software platforms. It provides a blueprint for deploying RAG-based (Retrieval-Augmented Generation) assistants in high-compliance environments, where protecting intellectual property and preventing PII leakage is crucial.
This gateway is built with a Security-First architecture, ensuring that sensitive enterprise data (API keys, secrets, PII) is neutralized before it ever reaches an inference engine.
To align with the absolute requirement for data privacy in enterprise environments, the system utilizes:
- Local Embeddings: Leveraging the
sentence-transformers/all-MiniLM-L6-v2model to vectorize documentation fragments locally. This ensures that a customer’s proprietary documentation is never transmitted to a third-party embedding provider. - Distributed Inference: The framework is configured to communicate with a remote inference node (hosted via LLM Studio) over a private LAN, mimicking a secure VPC or air-gapped environment.
Inspired by the logic of high-discretion administrative systems, the PII Privacy Gateway acts as a bi-directional shield:
- Inbound Shielding: Uses hardened regex patterns to intercept user queries. If a developer accidentally includes an
API_KEYorPASSWORDin their prompt, the gateway masks it in real-time. - Outbound Shielding: A final output parser scrubs the LLM’s response. This prevents "echo leaks"—where the model repeats a sensitive value it saw in the prompt history—ensuring the final delivery to the user is "spot on" and compliant.
The orchestration is built using LCEL (LangChain Expression Language). This modular approach allows for:
- Rapid Prototyping: The "pipe" operator (
|) allows for quick insertion of additional guardrails or document loaders. - Observability: Each step of the chain can be logged (e.g., via W&B Prompts) to monitor for hallucinations or retrieval friction.
During the development of this asset, a critical edge case was identified: Pattern-Match Bypassing. Initial regex patterns failed to catch secrets when they were formatted without standard spacing (e.g., API_KEY=12345). I iterated on the masking engine to use more aggressive character classes ([\s=:]+), ensuring the gateway remains robust against "messy" real-world developer inputs. This proactive approach to Inference DLP (Data Loss Prevention) is essential for building trust with enterprise customers who are hesitant to adopt LLM workflows due to security concerns.
- Secure Ingestion: Automated scraping of Weights & Biases documentation into a local FAISS vector store.
- Environment Isolation: Utilizes
.envmanagement to separate infrastructure configurations (Remote IPs) from the application logic. - Developer Acceleration: Provides executable Python code snippets for W&B Artifacts and Run logging.
- Orchestration: LangChain, LCEL
- Language: Python 3.x
- Embeddings: HuggingFace (Local)
- Vector Store: FAISS
- Inference: Local LLM (OpenAI-Compatible API)
This project is designed to run in a controlled virtual environment to ensure dependency stability.
git clone https://github.com/5tev3G/secure-ai-rag-framework.git
cd secure-ai-rag-framework
Create a fresh virtual environment and install only the necessary top-level dependencies:
# MacOS/Linux
python3 -m venv venv && source venv/bin/activate
# Windows
python -m venv venv
.\venv\Scripts\activate
# Install requirements
pip install --upgrade pip
pip install python-dotenv langchain langchain-community langchain-core \
langchain-openai langchain-text-splitters langchain-huggingface \
sentence-transformers faiss-cpu beautifulsoup4 tiktoken
Create a .env file in the root directory to manage your infrastructure variables:
LLM_STUDIO_IP=192.168.x.xxx # IP address of your local inference node
OPENAI_API_KEY=not-needed # Required by LangChain but unused by local LLM
The gateway automatically scrapes documentation, generates local embeddings, and prepares the vector store upon execution. Ensure your local inference server (e.g., LM Studio) is running and accessible via the IP defined in your .env.
Execute the main script to test the bi-directional masking and retrieval:
python secure_enablement_rag.py
The framework will demonstrate the Security Proxy in action. When a query contains a sensitive string, the gateway intercepts it before transmission:
Ingesting technical documentation...
Initializing local embedding model (sentence-transformers)...
Connected to secure inference node at: http://192.168.8.237:1234/v1
User Query: Give me a Python code snippet to log a dataset as an artifact in W&B. My API_KEY=12345-SECRET
Processing through Secure Gateway...
[DEBUG] Outbound Question to API: Give me a Python code snippet to log a dataset as an artifact in W&B. My API_KEY=MASKED_BY_GATEWAY
** Response **
Here's the Python code snippet to log a dataset as an artifact in W&B:
```python
import wandb
# Initialize W&B with your API key
wandb.init(project="your-project-name", job_type="your-job-type")
# Create a dataset artifact object
artifact = wandb.Artifact(name="my-dataset-artifact", type="dataset")
# Add a file to the artifact (e.g., a CSV file)
artifact.add_file(local_path="path/to/dataset.csv", name="dataset.csv")
# Log the artifact to W&B
with artifact:
run.log_artifact(artifact)
# Save the artifact to W&B
artifact.save()
In this code snippet:
- We initialize W&B with our API key using
wandb.init(). - We create a dataset artifact object using
wandb.Artifact(), specifying the name and type of the artifact. - We add a file to the artifact using
artifact.add_file(). In this example, we're adding a CSV file named "dataset.csv". - We log the artifact to W&B using
run.log_artifact(artifact). - Finally, we save the artifact to W&B using
artifact.save().
