Skip to content

Commit 0438d1b

Browse files
Devang Jhabakhwaleed
authored andcommitted
Adding guardrails block
1 parent b62747e commit 0438d1b

File tree

23 files changed

+2165
-7
lines changed

23 files changed

+2165
-7
lines changed
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
title: Guardrails
3+
---
4+
5+
import { Callout } from 'fumadocs-ui/components/callout'
6+
import { Step, Steps } from 'fumadocs-ui/components/steps'
7+
import { Tab, Tabs } from 'fumadocs-ui/components/tabs'
8+
import { Image } from '@/components/ui/image'
9+
import { Video } from '@/components/ui/video'
10+
11+
The Guardrails block validates and protects your AI workflows by checking content against multiple validation types. Ensure data quality, prevent hallucinations, detect PII, and enforce format requirements before content moves through your workflow.
12+
13+
## Overview
14+
15+
The Guardrails block enables you to:
16+
17+
<Steps>
18+
<Step>
19+
<strong>Validate JSON Structure</strong>: Ensure LLM outputs are valid JSON before parsing
20+
</Step>
21+
<Step>
22+
<strong>Match Regex Patterns</strong>: Verify content matches specific formats (emails, phone numbers, URLs, etc.)
23+
</Step>
24+
<Step>
25+
<strong>Detect Hallucinations</strong>: Use RAG + LLM scoring to validate AI outputs against knowledge base content
26+
</Step>
27+
<Step>
28+
<strong>Detect PII</strong>: Identify and optionally mask personally identifiable information across 40+ entity types
29+
</Step>
30+
</Steps>
31+
32+
## Validation Types
33+
34+
### JSON Validation
35+
36+
Validates that content is properly formatted JSON. Perfect for ensuring structured LLM outputs can be safely parsed.
37+
38+
**Use Cases:**
39+
- Validate JSON responses from Agent blocks before parsing
40+
- Ensure API payloads are properly formatted
41+
- Check structured data integrity
42+
43+
**Output:**
44+
- `passed`: `true` if valid JSON, `false` otherwise
45+
- `error`: Error message if validation fails (e.g., "Invalid JSON: Unexpected token...")
46+
47+
### Regex Validation
48+
49+
Checks if content matches a specified regular expression pattern.
50+
51+
**Use Cases:**
52+
- Validate email addresses
53+
- Check phone number formats
54+
- Verify URLs or custom identifiers
55+
- Enforce specific text patterns
56+
57+
**Configuration:**
58+
- **Regex Pattern**: The regular expression to match against (e.g., `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$` for emails)
59+
60+
**Output:**
61+
- `passed`: `true` if content matches pattern, `false` otherwise
62+
- `error`: Error message if validation fails
63+
64+
### Hallucination Detection
65+
66+
Uses Retrieval-Augmented Generation (RAG) with LLM scoring to detect when AI-generated content contradicts or isn't grounded in your knowledge base.
67+
68+
**How It Works:**
69+
1. Queries your knowledge base for relevant context
70+
2. Sends both the AI output and retrieved context to an LLM
71+
3. LLM assigns a confidence score (0-10 scale)
72+
- **0** = Full hallucination (completely ungrounded)
73+
- **10** = Fully grounded (completely supported by knowledge base)
74+
4. Validation passes if score ≥ threshold (default: 3)
75+
76+
**Configuration:**
77+
- **Knowledge Base**: Select from your existing knowledge bases
78+
- **Model**: Choose LLM for scoring (requires strong reasoning - GPT-4o, Claude 3.7 Sonnet recommended)
79+
- **API Key**: Authentication for selected LLM provider (auto-hidden for hosted/Ollama models)
80+
- **Confidence Threshold**: Minimum score to pass (0-10, default: 3)
81+
- **Top K** (Advanced): Number of knowledge base chunks to retrieve (default: 10)
82+
83+
**Output:**
84+
- `passed`: `true` if confidence score ≥ threshold
85+
- `score`: Confidence score (0-10)
86+
- `reasoning`: LLM's explanation for the score
87+
- `error`: Error message if validation fails
88+
89+
**Use Cases:**
90+
- Validate Agent responses against documentation
91+
- Ensure customer support answers are factually accurate
92+
- Verify generated content matches source material
93+
- Quality control for RAG applications
94+
95+
### PII Detection
96+
97+
Detects personally identifiable information using Microsoft Presidio. Supports 40+ entity types across multiple countries and languages.
98+
99+
**How It Works:**
100+
1. Scans content for PII entities using pattern matching and NLP
101+
2. Returns detected entities with locations and confidence scores
102+
3. Optionally masks detected PII in the output
103+
104+
**Configuration:**
105+
- **PII Types to Detect**: Select from grouped categories via modal selector
106+
- **Common**: Person name, Email, Phone, Credit card, IP address, etc.
107+
- **USA**: SSN, Driver's license, Passport, etc.
108+
- **UK**: NHS number, National insurance number
109+
- **Spain**: NIF, NIE, CIF
110+
- **Italy**: Fiscal code, Driver's license, VAT code
111+
- **Poland**: PESEL, NIP, REGON
112+
- **Singapore**: NRIC/FIN, UEN
113+
- **Australia**: ABN, ACN, TFN, Medicare
114+
- **India**: Aadhaar, PAN, Passport, Voter number
115+
- **Mode**:
116+
- **Detect**: Only identify PII (default)
117+
- **Mask**: Replace detected PII with masked values
118+
- **Language**: Detection language (default: English)
119+
120+
**Output:**
121+
- `passed`: `false` if any selected PII types are detected
122+
- `detectedEntities`: Array of detected PII with type, location, and confidence
123+
- `maskedText`: Content with PII masked (only if mode = "Mask")
124+
- `error`: Error message if validation fails
125+
126+
**Use Cases:**
127+
- Block content containing sensitive personal information
128+
- Mask PII before logging or storing data
129+
- Compliance with GDPR, HIPAA, and other privacy regulations
130+
- Sanitize user inputs before processing
131+
132+
## Configuration
133+
134+
### Content to Validate
135+
136+
The input content to validate. This typically comes from:
137+
- Agent block outputs: `<agent.content>`
138+
- Function block results: `<function.output>`
139+
- API responses: `<api.output>`
140+
- Any other block output
141+
142+
### Validation Type
143+
144+
Choose from four validation types:
145+
- **Valid JSON**: Check if content is properly formatted JSON
146+
- **Regex Match**: Verify content matches a regex pattern
147+
- **Hallucination Check**: Validate against knowledge base with LLM scoring
148+
- **PII Detection**: Detect and optionally mask personally identifiable information
149+
150+
## Outputs
151+
152+
All validation types return:
153+
154+
- **`<guardrails.passed>`**: Boolean indicating if validation passed
155+
- **`<guardrails.validationType>`**: The type of validation performed
156+
- **`<guardrails.input>`**: The original input that was validated
157+
- **`<guardrails.error>`**: Error message if validation failed (optional)
158+
159+
Additional outputs by type:
160+
161+
**Hallucination Check:**
162+
- **`<guardrails.score>`**: Confidence score (0-10)
163+
- **`<guardrails.reasoning>`**: LLM's explanation
164+
165+
**PII Detection:**
166+
- **`<guardrails.detectedEntities>`**: Array of detected PII entities
167+
- **`<guardrails.maskedText>`**: Content with PII masked (if mode = "Mask")
168+
169+
## Example Use Cases
170+
171+
### Validate JSON Before Parsing
172+
173+
<div className="mb-4 rounded-md border p-4">
174+
<h4 className="font-medium">Scenario: Ensure Agent output is valid JSON</h4>
175+
<ol className="list-decimal pl-5 text-sm">
176+
<li>Agent generates structured JSON response</li>
177+
<li>Guardrails validates JSON format</li>
178+
<li>Condition block checks `<guardrails.passed>`</li>
179+
<li>If passed → Parse and use data, If failed → Retry or handle error</li>
180+
</ol>
181+
</div>
182+
183+
### Prevent Hallucinations
184+
185+
<div className="mb-4 rounded-md border p-4">
186+
<h4 className="font-medium">Scenario: Validate customer support responses</h4>
187+
<ol className="list-decimal pl-5 text-sm">
188+
<li>Agent generates response to customer question</li>
189+
<li>Guardrails checks against support documentation knowledge base</li>
190+
<li>If confidence score ≥ 3 → Send response</li>
191+
<li>If confidence score \< 3 → Flag for human review</li>
192+
</ol>
193+
</div>
194+
195+
### Block PII in User Inputs
196+
197+
<div className="mb-4 rounded-md border p-4">
198+
<h4 className="font-medium">Scenario: Sanitize user-submitted content</h4>
199+
<ol className="list-decimal pl-5 text-sm">
200+
<li>User submits form with text content</li>
201+
<li>Guardrails detects PII (emails, phone numbers, SSN, etc.)</li>
202+
<li>If PII detected → Reject submission or mask sensitive data</li>
203+
<li>If no PII → Process normally</li>
204+
</ol>
205+
</div>
206+
207+
### Validate Email Format
208+
209+
<div className="mb-4 rounded-md border p-4">
210+
<h4 className="font-medium">Scenario: Check email address format</h4>
211+
<ol className="list-decimal pl-5 text-sm">
212+
<li>Agent extracts email from text</li>
213+
<li>Guardrails validates with regex pattern</li>
214+
<li>If valid → Use email for notification</li>
215+
<li>If invalid → Request correction</li>
216+
</ol>
217+
</div>
218+
219+
## Best Practices
220+
221+
- **Chain with Condition blocks**: Use `<guardrails.passed>` to branch workflow logic based on validation results
222+
- **Use JSON validation before parsing**: Always validate JSON structure before attempting to parse LLM outputs
223+
- **Choose appropriate PII types**: Only select the PII entity types relevant to your use case for better performance
224+
- **Set reasonable confidence thresholds**: For hallucination detection, adjust threshold based on your accuracy requirements (higher = stricter)
225+
- **Use strong models for hallucination detection**: GPT-4o or Claude 3.7 Sonnet provide more accurate confidence scoring
226+
- **Mask PII for logging**: Use "Mask" mode when you need to log or store content that may contain PII
227+
- **Test regex patterns**: Validate your regex patterns thoroughly before deploying to production
228+
- **Monitor validation failures**: Track `<guardrails.error>` messages to identify common validation issues
229+
230+
<Callout type="info">
231+
Guardrails validation happens synchronously in your workflow. For hallucination detection, choose faster models (like GPT-4o-mini) if latency is critical.
232+
</Callout>
233+

0 commit comments

Comments
 (0)