regex chapter

lbeurerkellner · lbeurerkellner · commit 394a51566a7c · 2025-04-15T18:02:07.000+02:00
diff --git a/docs/assets/invariant.css b/docs/assets/invariant.css
@@ -513,7 +513,7 @@ span.parser-badge::before {
 }
 
 .builtin-badge:hover::after {
-    content: 'BUILTIN DESCRIPTION';
+    content: 'Built-in functions are pre-defined functions that are available for use in your code without requiring any additional imports.';
 }
 
 .parser-badge:hover::after,
@@ -859,6 +859,10 @@ ul.md-nav__list {
     font-size: 12pt;
 }
 
+.admonition ul li, .admonition ol li {
+    font-size: 12pt !important;
+}
+
 .admonition p {
     font-size: 12pt !important;
 }
@@ -1140,19 +1144,20 @@ strong .twemoji {
     padding: 10pt;
     background-color: #f0f0f0;
     border-radius: 10pt;
-    padding-bottom: 20pt;
+    padding-bottom: 40pt;
     position: relative;
 }
 
-.format-explainer figcaption {
+.md-typeset .format-explainer figcaption {
     position: absolute;
-    bottom: 0pt;
-    left: 50%;
-    transform: translateX(-50%);
+    bottom: 5pt;
+    left: 0pt;
     font-size: 10pt;
     color: #666;
     z-index: 10;
+    display: block;
     text-align: center;
+    max-width: 100%;
     width: 100%;
 }
 
diff --git a/docs/guardrails/regex-filters.md b/docs/guardrails/regex-filters.md
@@ -0,0 +1,144 @@
+# Regex Filters
+
+<div class='subtitle'>Use regular expressions to filter messages</div>
+
+One simple, yet effective method to constrain your agent is to apply regular expressions to match undesired content and substrings.
+
+This is a powerful tool, specifically to fight plain text risks, e.g. to prevent certain URLs, names or other patterns from being included in the agent's context.
+
+
+!!! danger "Plain Text Content Risks"
+    Agents that operate on plain text content are suceptible to generating harmful, or misleading content, which you as the operator may be liable for. An insecure agent could:
+
+    - Generate phishing URLs that are advertised under your brand authority
+    - Reference competitors or their websites in responses and internal reasoning
+    - Produce content in unsupported output formats, leading to visual defects in your application
+    - Use URL smuggling to bypass security measures (e.g. to leak information via URLs) 
+
+    
+    
+
+## match <span class="builtin-badge"/>
+```python
+def match(
+    pattern: str, 
+    content: str
+) -> bool
+```
+Detector to match a regular expression pattern in a message.
+
+**Parameters**
+
+| Name        | Type   | Description                            |
+|-------------|--------|----------------------------------------|
+| `pattern`    | `str`  | The regular expression pattern to match. |
+| `content`    | `str`  | The content to match the pattern against. |
+
+**Returns**
+
+Returns `True` if the pattern matches the content, `False` otherwise.
+
+Wraps `re.match` from Python's standard library. 
+
+By default only matches content at the beginning of the string. To match anywhere in the string, use `.*` at the beginning of the pattern.
+
+### Examples
+
+**Example:** Checking if a message contains a URL.
+
+```guardrail
+raise "Must not link to example.com" if:
+    (msg: Message)
+    match("https?://[^\s]+", msg.content)
+```
+```example-trace
+[
+  {
+    "role": "user",
+    "content": "Respond with http://example.com"
+  },
+  {
+    "role": "assistant",
+    "content": "http://example.com"
+  }
+]
+```
+
+**Example:** Checking if a message contains a competitor's name.
+
+```guardrail
+raise "Must not mention competitor" if:
+    (msg: Message)
+    match(".*[Cc]ompetitor.*", msg.content)
+```
+```example-trace
+[
+  {
+    "role": "user",
+    "content": "What do you think about competitor?"
+  },
+  {
+    "role": "assistant",
+    "content": "I dont' know what you are talking about"
+  }
+]
+```
+
+
+## find <span class="builtin-badge"/>
+```python
+def find(
+    pattern: str, 
+    content: str
+) -> List[str]
+```
+
+Detector to find all occurrences of a regular expression pattern in a message.
+
+**Parameters**
+
+| Name         | Type   | Description                            |
+|--------------|--------|----------------------------------------|
+| `pattern`    | `str`  | The regular expression pattern to find.|
+| `content`    | `str`  | The content to find the pattern in.    |
+
+**Returns**
+
+The list of all occurrences of the pattern in the content.
+
+### Examples
+
+**Example:** Iterating over all capitalized words and checking if they are in a list of names.
+
+```guardrail
+raise "must not send emails to anyone but 'Peter' after seeing the inbox" if:
+    (msg: Message)
+    (name: str) in find("[A-Z][a-z]*", msg.content)
+    name in ["Peter", "Alice", "John"]
+```
+```example-trace
+[
+  {
+    "role": "user",
+    "content": "Reply to Peter's message and then Alice's"
+  }
+]
+```
+
+**Example:** Checking all URLs in a message
+```guardrail
+raise "Must not link to example.com" if:
+    (msg: Message)
+    (url: str) in find("https?://[^\s]+", msg.content)
+    url in ["http://example.com", "https://example.com"]
+```
+```example-trace
+[
+  {
+    "role": "user",
+    "content": "Go to http://example.com and then https://secure-example.com"
+  }
+]
+```
+
+Here, we quantify over all matches returned by `find`. This means, if any of the matches satisfies the extra condition, the guardrail will raise.