Skip to content

Conversation

@tomerqodo
Copy link

@tomerqodo tomerqodo commented Jan 30, 2026

Benchmark PR from agentic-review-benchmarks#12

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved handling of special characters (%, _, ) in search and filter operations across the application. These characters are now treated as literal text rather than wildcards, ensuring more accurate search results and preventing unintended matches.
  • Tests

    • Added comprehensive integration tests validating proper handling of special characters in keyword searches and filtering across multiple services.

✏️ Tip: You can customize this high-level summary in your review settings.

NeatGuyCoding and others added 4 commits January 25, 2026 12:05
…ralized

utility function, ensuring consistent and secure handling of special characters
across all database queries.

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic

Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

Introduces a centralized escape_like_pattern() utility function in libs/helper to sanitize SQL LIKE pattern inputs. The function escapes backslashes, percent signs, and underscores, then applied across multiple services, controllers, and RAG modules to prevent wildcard injection and ensure special characters are treated as literals in LIKE queries.

Changes

Cohort / File(s) Summary
Core Utility
api/libs/helper.py
New public function escape_like_pattern(pattern: str) -> str that escapes backslashes, percent signs, and underscores for safe SQL LIKE pattern use.
API Controllers
api/controllers/console/app/conversation.py
Applies escape_like_pattern to keyword filters in conversation listing and detail query paths; uses escaped value with ilike(escape="\\") for safer pattern matching.
Service Layer – Keyword Filtering
api/services/conversation_service.py, api/services/tag_service.py, api/services/app_service.py, api/services/annotation_service.py, api/services/external_knowledge_service.py
Imports escape_like_pattern and applies it to keyword/search inputs before LIKE comparisons; adds explicit escape="\\" parameter to prevent wildcard injection.
Service Layer – Search & Retrieval
api/services/dataset_service.py
Sanitizes search terms in get_datasets, get_child_chunks, and get_segments using escape_like_pattern with ilike(escape="\\") for literal matching.
Service Layer – Workflow
api/services/workflow_app_service.py
Replaces manual escaping with centralized escape_like_pattern; updates multiple LIKE filters (inputs, outputs, session_id) with explicit escape characters when keyword filtering is applied.
Data/RAG Layer
api/controllers/console/datasets/datasets_segments.py
Escapes keyword using escape_like_pattern for JSONB/text extraction and string casting LIKE comparisons; applies escape parameter in ilike calls.
Vector Database – Search
api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py, api/core/rag/datasource/vdb/iris/iris_vector.py
Imports escape_like_pattern and applies it to query input in fallback full-text search paths; adds ESCAPE clause to LIKE predicates for proper special-character handling.
RAG Retrieval
api/core/rag/retrieval/dataset_retrieval.py
Sanitizes metadata filter values in process_metadata_filter_func for contains, not-contains, start-with, and end-with scenarios using escape_like_pattern with backslash escape character.
Unit Tests
api/tests/unit_tests/libs/test_helper.py
New test suite TestEscapeLikePattern validates escaping of percent, underscore, and backslash; tests combined special characters, empty/None inputs, normal strings, and escape order.
Integration Tests – Service Validation
api/tests/test_containers_integration_tests/services/test_annotation_service.py, api/tests/test_containers_integration_tests/services/test_tag_service.py, api/tests/test_containers_integration_tests/services/test_workflow_app_service.py
New test methods verify literal matching of special characters (%, _, \) in keyword searches; ensure escaping prevents wildcard interpretation and unintended matches.
Integration Tests – App Service
api/tests/test_containers_integration_tests/services/test_app_service.py
Introduces delayed imports of AppService within test methods; adds test_get_apps_with_special_characters_in_name to verify escaping behavior in app name searches.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops through queries, swift and bright,
Backslashes, percents, now escaped right,
No wildcards wild, no injections sly,
Just literals matched—safe as can be, my!
The LIKE is pure, the searches clean,
Safest pattern matching ever seen! 🛡️

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and lacks substantial detail. It only references a benchmark PR link without following the required template structure (Summary, Screenshots, Checklist sections). Add a detailed summary of changes, explain the motivation and context, describe what the centralized utility function does, and complete the PR template sections including the checklist items.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: refactoring SQL LIKE pattern escaping logic into a centralized utility function for consistent and secure handling of special characters across database queries.
Docstring Coverage ✅ Passed Docstring coverage is 82.14% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch coderabbit_full_head_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_across_all_database_queri

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py`:
- Around line 987-991: The LIKE ESCAPE clause in clickzetta_vector.py does not
match the output of escape_like_pattern: change the filter_clauses.append call
that builds the WHERE clause using escaped_query so the ESCAPE character is a
single backslash (i.e. use ESCAPE '\\' instead of ESCAPE '\\\\'); locate the
usage that imports escape_like_pattern, the escaped_query variable, and the
filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE
'\\\\'") and replace the double-backslash escape literal with a single-backslash
escape literal to align with escape_like_pattern.

In `@api/core/rag/datasource/vdb/iris/iris_vector.py`:
- Around line 290-300: The LIKE escape is inconsistent: escape_like_pattern
currently uses backslash escapes (e.g., "50\%"), but the SQL sets ESCAPE '|' so
backslashes are treated literally; fix by making the escape characters
consistent — either update escape_like_pattern to accept/use '|' (and keep SQL
ESCAPE '|' in iris_vector.py where query_pattern and cursor.execute are used) or
change the SQL string in iris_vector.py to use ESCAPE '\' (so the backslash
escapes from escape_like_pattern are honored). Ensure the chosen escape char is
applied both where escape_like_pattern generates the pattern and in the SQL
WHERE ... LIKE ... ESCAPE clause.

In `@api/core/rag/retrieval/dataset_retrieval.py`:
- Around line 1198-1208: The "not contains" branch is missing the escape arg so
backslashes aren't treated consistently; in the match branch for "not contains"
where escape_like_pattern is used and filters.append(json_field.notlike(...)) is
called, add the escape="\\\\" parameter to the notlike() invocation (matching
the other branches that pass escape="\\") so literal %, _, and \ are handled the
same way.
🧹 Nitpick comments (4)
api/libs/helper.py (1)

61-65: Type hint mismatch with None handling.

The function signature declares pattern: str but the if not pattern check allows None to pass through and be returned. This creates a type inconsistency. If None is an intentional valid input (as the tests suggest), the signature should be pattern: str | None with return type str | None.

Additionally, escapedPattern uses camelCase instead of Python's conventional snake_case (escaped_pattern).

♻️ Proposed fix
-def escape_like_pattern(pattern: str) -> str:
+def escape_like_pattern(pattern: str | None) -> str | None:
     """
     Escape special characters in a string for safe use in SQL LIKE patterns.
     ...
     """
     if not pattern:
         return pattern
     # Escape backslash first, then percent and underscore
-    escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
-    return escapedPattern
+    escaped_pattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
+    return escaped_pattern
api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py (1)

988-988: Consider moving import to module top-level for consistency.

The import is placed inside the function, while api/controllers/console/datasets/datasets_segments.py imports escape_like_pattern at the module level. Consider moving this import to the top of the file for consistency with other usages in this PR.

api/tests/test_containers_integration_tests/services/test_workflow_app_service.py (1)

424-425: Inconsistent variable naming: workflow_run_4 without workflow_run_3.

The variable naming jumps from workflow_run_2 to workflow_run_4, skipping workflow_run_3. This appears to be a copy-paste artifact. Consider renaming to workflow_run_3 for consistency and readability.

♻️ Suggested fix for variable naming consistency
-        # Test 3: Search with % should NOT match 100% (verifies escaping works correctly)
-        workflow_run_4 = WorkflowRun(
+        # Test 3: Search with % should NOT match 100% (verifies escaping works correctly)
+        workflow_run_3 = WorkflowRun(

And similarly update:

  • workflow_run_4workflow_run_3 (lines 440, 447, 467)
  • workflow_app_log_4workflow_app_log_3 (lines 443, 452, 453, 454)
api/tests/test_containers_integration_tests/services/test_tag_service.py (1)

390-393: Test 4 is redundant with Test 1.

Lines 390-393 repeat the same "50%" search as lines 375-378. The assertion all("50%" in item.name for item in result) is already implied by result[0].name == "50% discount" from Test 1. Consider removing this redundant test case or adding a distinct scenario.

♻️ Suggested fix to remove redundant test
-        # Test 4 - Search with % should NOT match 100% (verifies escaping works)
-        result = TagService.get_tags("app", tenant.id, keyword="50%")
-        assert len(result) == 1
-        assert all("50%" in item.name for item in result)

Comment on lines +987 to +991
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern

escaped_query = escape_like_pattern(query).replace("'", "''")
filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\\\'")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find the escape_like_pattern helper function
fd escape_like_pattern . --type f

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 76


🏁 Script executed:

# Also search for escape_like_pattern in the codebase to see where it's imported from
rg -r "escape_like_pattern" --type py

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 490


🏁 Script executed:

# Check other ESCAPE usages in ClickZetta code
rg -n "ESCAPE" api/core/rag/datasource/vdb/clickzetta/ -B2 -A2

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 630


🏁 Script executed:

# Look at the helper module to find escape_like_pattern implementation
find . -path "*/libs/helper*" -type f

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 114


🏁 Script executed:

# Read the escape_like_pattern function from libs/helper.py
cat api/libs/helper.py | head -200

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 5601


🏁 Script executed:

# Get the size first to avoid reading entire file if too large
wc -l api/libs/helper.py

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 116


🏁 Script executed:

# Search for tests of escape_like_pattern
rg -n "escape_like_pattern" --type py -B2 -A5

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 21029


🏁 Script executed:

# Check the context around the problematic line and look for any comments
sed -n '980,1000p' api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 1075


🏁 Script executed:

# Search for any ClickZetta documentation or comments about LIKE escaping
rg -n "escape\|LIKE\|ESCAPE" api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py -B2 -A2

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 76


🏁 Script executed:

# Verify Python string literal interpretation
python3 <<'EOF'
# Test what Python string literals produce
escape_double = '\\\\'  # Four backslashes in source = two backslashes actual
escape_single = '\\'    # Two backslashes in source = one backslash actual

print(f"Four backslashes in source (\\\\\\\\) = {repr(escape_double)} = {escape_double}")
print(f"Two backslashes in source (\\\\) = {repr(escape_single)} = {escape_single}")
print(f"Length of escape_double: {len(escape_double)}")
print(f"Length of escape_single: {len(escape_single)}")

# What escape_like_pattern produces for a pattern with %
test_input = "50%"
escaped = test_input.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
print(f"\nescape_like_pattern('50%') produces: {repr(escaped)}")
print(f"Actual characters: {escaped}")
print(f"Escape char needed: single backslash (\\)")
EOF

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 348


Fix ESCAPE clause to match escape_like_pattern: use single backslash, not double.

The escape_like_pattern function produces patterns with single backslash escaping (\%, \_, \\), as documented and verified by unit tests. However, line 991 uses ESCAPE '\\\\' which evaluates to \\ (double backslash), creating a mismatch.

All other usages in the codebase (dataset_service.py, workflow_app_service.py, tag_service.py, etc.) correctly use escape="\\" (single backslash). Change line 991 to:

filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\'")

The escape character must be a single backslash to match the output of escape_like_pattern.

🤖 Prompt for AI Agents
In `@api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py` around lines 987
- 991, The LIKE ESCAPE clause in clickzetta_vector.py does not match the output
of escape_like_pattern: change the filter_clauses.append call that builds the
WHERE clause using escaped_query so the ESCAPE character is a single backslash
(i.e. use ESCAPE '\\' instead of ESCAPE '\\\\'); locate the usage that imports
escape_like_pattern, the escaped_query variable, and the
filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE
'\\\\'") and replace the double-backslash escape literal with a single-backslash
escape literal to align with escape_like_pattern.

Comment on lines +290 to 300
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern

escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
WHERE text LIKE ?
WHERE text LIKE ? ESCAPE '|'
"""
cursor.execute(sql, (query_pattern,))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: ESCAPE character mismatch breaks pattern escaping.

The escape_like_pattern helper escapes special characters using backslash (\), producing patterns like 50\% discount. However, the SQL uses ESCAPE '|', which tells the database to interpret the pipe character as the escape character instead.

This means the backslash escapes produced by the helper will be treated as literal backslashes, not escape sequences, completely defeating the purpose of the escaping.

🐛 Proposed fix - use consistent escape character
             # Fallback to LIKE search (inefficient for large datasets)
             # Escape special characters for LIKE clause to prevent SQL injection
             from libs.helper import escape_like_pattern

             escaped_query = escape_like_pattern(query)
             query_pattern = f"%{escaped_query}%"
             sql = f"""
                 SELECT TOP {top_k} id, text, meta
                 FROM {self.schema}.{self.table_name}
-                WHERE text LIKE ? ESCAPE '|'
+                WHERE text LIKE ? ESCAPE '\\'
             """
             cursor.execute(sql, (query_pattern,))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern
escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
WHERE text LIKE ?
WHERE text LIKE ? ESCAPE '|'
"""
cursor.execute(sql, (query_pattern,))
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern
escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
WHERE text LIKE ? ESCAPE '\\'
"""
cursor.execute(sql, (query_pattern,))
🤖 Prompt for AI Agents
In `@api/core/rag/datasource/vdb/iris/iris_vector.py` around lines 290 - 300, The
LIKE escape is inconsistent: escape_like_pattern currently uses backslash
escapes (e.g., "50\%"), but the SQL sets ESCAPE '|' so backslashes are treated
literally; fix by making the escape characters consistent — either update
escape_like_pattern to accept/use '|' (and keep SQL ESCAPE '|' in iris_vector.py
where query_pattern and cursor.execute are used) or change the SQL string in
iris_vector.py to use ESCAPE '\' (so the backslash escapes from
escape_like_pattern are honored). Ensure the chosen escape char is applied both
where escape_like_pattern generates the pattern and in the SQL WHERE ... LIKE
... ESCAPE clause.

Comment on lines +1198 to 1208
from libs.helper import escape_like_pattern

match condition:
case "contains":
filters.append(json_field.like(f"%{value}%"))
escaped_value = escape_like_pattern(str(value))
filters.append(json_field.like(f"%{escaped_value}%", escape="\\"))

case "not contains":
filters.append(json_field.notlike(f"%{value}%"))
escaped_value = escape_like_pattern(str(value))
filters.append(json_field.notlike(f"%{escaped_value}%"))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n api/core/rag/retrieval/dataset_retrieval.py | sed -n '1195,1215p'

Repository: agentic-review-benchmarks/dify-combined-coderabbit

Length of output: 1098


🌐 Web query:

SQLAlchemy notlike escape parameter behavior and documentation

💡 Result:

In SQLAlchemy, notlike() / not_like() accepts an escape parameter, but its behavior is only to render the SQL ESCAPE clause; it does not automatically rewrite/escape your pattern string.

What escape= does for LIKE / NOT LIKE

  • column.like(pattern, escape=ch) renders ... LIKE <pattern> ESCAPE '<ch>' (i.e., it declares which character the database should treat as the escape character). (docs.sqlalchemy.org)
  • column.not_like(pattern, escape=ch) (and the legacy name notlike()) is the same idea for NOT LIKE, and is documented as equivalent to ~column.like(pattern) (so the same escape semantics apply). (docs.sqlalchemy.org)
  • The rename: not_like() is the newer name; notlike() remains for backwards compatibility. (docs.sqlalchemy.org)

What escape= does not do

  • LIKE / NOT LIKE do not have an autoescape feature; SQLAlchemy will not scan your pattern and insert escape characters for % / _ automatically.
  • If you want SQLAlchemy to also transform a literal value by escaping %, _, and the escape character itself, use the convenience string operators like .contains(..., autoescape=True, escape=...), .startswith(..., autoescape=True, escape=...), .endswith(..., autoescape=True, escape=...), etc. (docs.sqlalchemy.org)

Where it’s documented

  • ColumnElement.like(other, escape=None) documents escape as an “optional escape character” that “renders the ESCAPE keyword”. (docs.sqlalchemy.org)
  • ColumnElement.not_like(other, escape=None) / notlike(other, escape=None) are documented under Column Elements and Expressions; not_like() is described as NOT LIKE and equivalent to ~x.like(y), with the rename note. (docs.sqlalchemy.org)

Citations:


Add escape parameter to not contains to match other branches.

The notlike() call on line 1207 is missing the escape="\\" parameter present in all other filter branches ("contains" on line 1203, "start with" on line 1211, "end with" on line 1215). SQLAlchemy's notlike() accepts the same escape parameter as like(), and without it, the database won't recognize the backslash as an escape character. This breaks consistent handling of literal %, _, and \ characters across filter types.

✅ Proposed fix
-                filters.append(json_field.notlike(f"%{escaped_value}%"))
+                filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\"))
🤖 Prompt for AI Agents
In `@api/core/rag/retrieval/dataset_retrieval.py` around lines 1198 - 1208, The
"not contains" branch is missing the escape arg so backslashes aren't treated
consistently; in the match branch for "not contains" where escape_like_pattern
is used and filters.append(json_field.notlike(...)) is called, add the
escape="\\\\" parameter to the notlike() invocation (matching the other branches
that pass escape="\\") so literal %, _, and \ are handled the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants