-
Notifications
You must be signed in to change notification settings - Fork 0
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: coderabbit_full_base_fixapi_refactors_the_sql_like_pattern_escaping_logic_to_use_a_centralized__utility_function_ensuring_consistent_and_secure_handling_of_special_characters_across_all_database_queri
Are you sure you want to change the base?
fix(api): refactors the SQL LIKE pattern escaping logic to use a centralized utility function, ensuring consistent and secure handling of special characters across all database queries. #12
Conversation
…ralized utility function, ensuring consistent and secure handling of special characters across all database queries. Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
…logic Signed-off-by: NeatGuyCoding <15627489+NeatGuyCoding@users.noreply.github.com>
📝 WalkthroughWalkthroughIntroduces a centralized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py`:
- Around line 987-991: The LIKE ESCAPE clause in clickzetta_vector.py does not
match the output of escape_like_pattern: change the filter_clauses.append call
that builds the WHERE clause using escaped_query so the ESCAPE character is a
single backslash (i.e. use ESCAPE '\\' instead of ESCAPE '\\\\'); locate the
usage that imports escape_like_pattern, the escaped_query variable, and the
filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE
'\\\\'") and replace the double-backslash escape literal with a single-backslash
escape literal to align with escape_like_pattern.
In `@api/core/rag/datasource/vdb/iris/iris_vector.py`:
- Around line 290-300: The LIKE escape is inconsistent: escape_like_pattern
currently uses backslash escapes (e.g., "50\%"), but the SQL sets ESCAPE '|' so
backslashes are treated literally; fix by making the escape characters
consistent — either update escape_like_pattern to accept/use '|' (and keep SQL
ESCAPE '|' in iris_vector.py where query_pattern and cursor.execute are used) or
change the SQL string in iris_vector.py to use ESCAPE '\' (so the backslash
escapes from escape_like_pattern are honored). Ensure the chosen escape char is
applied both where escape_like_pattern generates the pattern and in the SQL
WHERE ... LIKE ... ESCAPE clause.
In `@api/core/rag/retrieval/dataset_retrieval.py`:
- Around line 1198-1208: The "not contains" branch is missing the escape arg so
backslashes aren't treated consistently; in the match branch for "not contains"
where escape_like_pattern is used and filters.append(json_field.notlike(...)) is
called, add the escape="\\\\" parameter to the notlike() invocation (matching
the other branches that pass escape="\\") so literal %, _, and \ are handled the
same way.
🧹 Nitpick comments (4)
api/libs/helper.py (1)
61-65: Type hint mismatch with None handling.The function signature declares
pattern: strbut theif not patterncheck allowsNoneto pass through and be returned. This creates a type inconsistency. IfNoneis an intentional valid input (as the tests suggest), the signature should bepattern: str | Nonewith return typestr | None.Additionally,
escapedPatternuses camelCase instead of Python's conventional snake_case (escaped_pattern).♻️ Proposed fix
-def escape_like_pattern(pattern: str) -> str: +def escape_like_pattern(pattern: str | None) -> str | None: """ Escape special characters in a string for safe use in SQL LIKE patterns. ... """ if not pattern: return pattern # Escape backslash first, then percent and underscore - escapedPattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") - return escapedPattern + escaped_pattern = pattern.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_") + return escaped_patternapi/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py (1)
988-988: Consider moving import to module top-level for consistency.The import is placed inside the function, while
api/controllers/console/datasets/datasets_segments.pyimportsescape_like_patternat the module level. Consider moving this import to the top of the file for consistency with other usages in this PR.api/tests/test_containers_integration_tests/services/test_workflow_app_service.py (1)
424-425: Inconsistent variable naming:workflow_run_4withoutworkflow_run_3.The variable naming jumps from
workflow_run_2toworkflow_run_4, skippingworkflow_run_3. This appears to be a copy-paste artifact. Consider renaming toworkflow_run_3for consistency and readability.♻️ Suggested fix for variable naming consistency
- # Test 3: Search with % should NOT match 100% (verifies escaping works correctly) - workflow_run_4 = WorkflowRun( + # Test 3: Search with % should NOT match 100% (verifies escaping works correctly) + workflow_run_3 = WorkflowRun(And similarly update:
workflow_run_4→workflow_run_3(lines 440, 447, 467)workflow_app_log_4→workflow_app_log_3(lines 443, 452, 453, 454)api/tests/test_containers_integration_tests/services/test_tag_service.py (1)
390-393: Test 4 is redundant with Test 1.Lines 390-393 repeat the same "50%" search as lines 375-378. The assertion
all("50%" in item.name for item in result)is already implied byresult[0].name == "50% discount"from Test 1. Consider removing this redundant test case or adding a distinct scenario.♻️ Suggested fix to remove redundant test
- # Test 4 - Search with % should NOT match 100% (verifies escaping works) - result = TagService.get_tags("app", tenant.id, keyword="50%") - assert len(result) == 1 - assert all("50%" in item.name for item in result)
| # Escape special characters for LIKE clause to prevent SQL injection | ||
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_query = escape_like_pattern(query).replace("'", "''") | ||
| filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\\\'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find the escape_like_pattern helper function
fd escape_like_pattern . --type fRepository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 76
🏁 Script executed:
# Also search for escape_like_pattern in the codebase to see where it's imported from
rg -r "escape_like_pattern" --type pyRepository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 490
🏁 Script executed:
# Check other ESCAPE usages in ClickZetta code
rg -n "ESCAPE" api/core/rag/datasource/vdb/clickzetta/ -B2 -A2Repository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 630
🏁 Script executed:
# Look at the helper module to find escape_like_pattern implementation
find . -path "*/libs/helper*" -type fRepository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 114
🏁 Script executed:
# Read the escape_like_pattern function from libs/helper.py
cat api/libs/helper.py | head -200Repository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 5601
🏁 Script executed:
# Get the size first to avoid reading entire file if too large
wc -l api/libs/helper.pyRepository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 116
🏁 Script executed:
# Search for tests of escape_like_pattern
rg -n "escape_like_pattern" --type py -B2 -A5Repository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 21029
🏁 Script executed:
# Check the context around the problematic line and look for any comments
sed -n '980,1000p' api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.pyRepository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 1075
🏁 Script executed:
# Search for any ClickZetta documentation or comments about LIKE escaping
rg -n "escape\|LIKE\|ESCAPE" api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py -B2 -A2Repository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 76
🏁 Script executed:
# Verify Python string literal interpretation
python3 <<'EOF'
# Test what Python string literals produce
escape_double = '\\\\' # Four backslashes in source = two backslashes actual
escape_single = '\\' # Two backslashes in source = one backslash actual
print(f"Four backslashes in source (\\\\\\\\) = {repr(escape_double)} = {escape_double}")
print(f"Two backslashes in source (\\\\) = {repr(escape_single)} = {escape_single}")
print(f"Length of escape_double: {len(escape_double)}")
print(f"Length of escape_single: {len(escape_single)}")
# What escape_like_pattern produces for a pattern with %
test_input = "50%"
escaped = test_input.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
print(f"\nescape_like_pattern('50%') produces: {repr(escaped)}")
print(f"Actual characters: {escaped}")
print(f"Escape char needed: single backslash (\\)")
EOFRepository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 348
Fix ESCAPE clause to match escape_like_pattern: use single backslash, not double.
The escape_like_pattern function produces patterns with single backslash escaping (\%, \_, \\), as documented and verified by unit tests. However, line 991 uses ESCAPE '\\\\' which evaluates to \\ (double backslash), creating a mismatch.
All other usages in the codebase (dataset_service.py, workflow_app_service.py, tag_service.py, etc.) correctly use escape="\\" (single backslash). Change line 991 to:
filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE '\\'")
The escape character must be a single backslash to match the output of escape_like_pattern.
🤖 Prompt for AI Agents
In `@api/core/rag/datasource/vdb/clickzetta/clickzetta_vector.py` around lines 987
- 991, The LIKE ESCAPE clause in clickzetta_vector.py does not match the output
of escape_like_pattern: change the filter_clauses.append call that builds the
WHERE clause using escaped_query so the ESCAPE character is a single backslash
(i.e. use ESCAPE '\\' instead of ESCAPE '\\\\'); locate the usage that imports
escape_like_pattern, the escaped_query variable, and the
filter_clauses.append(f"{Field.CONTENT_KEY} LIKE '%{escaped_query}%' ESCAPE
'\\\\'") and replace the double-backslash escape literal with a single-backslash
escape literal to align with escape_like_pattern.
| # Escape special characters for LIKE clause to prevent SQL injection | ||
| from libs.helper import escape_like_pattern | ||
|
|
||
| escaped_query = escape_like_pattern(query) | ||
| query_pattern = f"%{escaped_query}%" | ||
| sql = f""" | ||
| SELECT TOP {top_k} id, text, meta | ||
| FROM {self.schema}.{self.table_name} | ||
| WHERE text LIKE ? | ||
| WHERE text LIKE ? ESCAPE '|' | ||
| """ | ||
| cursor.execute(sql, (query_pattern,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: ESCAPE character mismatch breaks pattern escaping.
The escape_like_pattern helper escapes special characters using backslash (\), producing patterns like 50\% discount. However, the SQL uses ESCAPE '|', which tells the database to interpret the pipe character as the escape character instead.
This means the backslash escapes produced by the helper will be treated as literal backslashes, not escape sequences, completely defeating the purpose of the escaping.
🐛 Proposed fix - use consistent escape character
# Fallback to LIKE search (inefficient for large datasets)
# Escape special characters for LIKE clause to prevent SQL injection
from libs.helper import escape_like_pattern
escaped_query = escape_like_pattern(query)
query_pattern = f"%{escaped_query}%"
sql = f"""
SELECT TOP {top_k} id, text, meta
FROM {self.schema}.{self.table_name}
- WHERE text LIKE ? ESCAPE '|'
+ WHERE text LIKE ? ESCAPE '\\'
"""
cursor.execute(sql, (query_pattern,))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Escape special characters for LIKE clause to prevent SQL injection | |
| from libs.helper import escape_like_pattern | |
| escaped_query = escape_like_pattern(query) | |
| query_pattern = f"%{escaped_query}%" | |
| sql = f""" | |
| SELECT TOP {top_k} id, text, meta | |
| FROM {self.schema}.{self.table_name} | |
| WHERE text LIKE ? | |
| WHERE text LIKE ? ESCAPE '|' | |
| """ | |
| cursor.execute(sql, (query_pattern,)) | |
| # Escape special characters for LIKE clause to prevent SQL injection | |
| from libs.helper import escape_like_pattern | |
| escaped_query = escape_like_pattern(query) | |
| query_pattern = f"%{escaped_query}%" | |
| sql = f""" | |
| SELECT TOP {top_k} id, text, meta | |
| FROM {self.schema}.{self.table_name} | |
| WHERE text LIKE ? ESCAPE '\\' | |
| """ | |
| cursor.execute(sql, (query_pattern,)) |
🤖 Prompt for AI Agents
In `@api/core/rag/datasource/vdb/iris/iris_vector.py` around lines 290 - 300, The
LIKE escape is inconsistent: escape_like_pattern currently uses backslash
escapes (e.g., "50\%"), but the SQL sets ESCAPE '|' so backslashes are treated
literally; fix by making the escape characters consistent — either update
escape_like_pattern to accept/use '|' (and keep SQL ESCAPE '|' in iris_vector.py
where query_pattern and cursor.execute are used) or change the SQL string in
iris_vector.py to use ESCAPE '\' (so the backslash escapes from
escape_like_pattern are honored). Ensure the chosen escape char is applied both
where escape_like_pattern generates the pattern and in the SQL WHERE ... LIKE
... ESCAPE clause.
| from libs.helper import escape_like_pattern | ||
|
|
||
| match condition: | ||
| case "contains": | ||
| filters.append(json_field.like(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.like(f"%{escaped_value}%", escape="\\")) | ||
|
|
||
| case "not contains": | ||
| filters.append(json_field.notlike(f"%{value}%")) | ||
| escaped_value = escape_like_pattern(str(value)) | ||
| filters.append(json_field.notlike(f"%{escaped_value}%")) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n api/core/rag/retrieval/dataset_retrieval.py | sed -n '1195,1215p'Repository: agentic-review-benchmarks/dify-combined-coderabbit
Length of output: 1098
🌐 Web query:
SQLAlchemy notlike escape parameter behavior and documentation
💡 Result:
In SQLAlchemy, notlike() / not_like() accepts an escape parameter, but its behavior is only to render the SQL ESCAPE clause; it does not automatically rewrite/escape your pattern string.
What escape= does for LIKE / NOT LIKE
column.like(pattern, escape=ch)renders... LIKE <pattern> ESCAPE '<ch>'(i.e., it declares which character the database should treat as the escape character). (docs.sqlalchemy.org)column.not_like(pattern, escape=ch)(and the legacy namenotlike()) is the same idea forNOT LIKE, and is documented as equivalent to~column.like(pattern)(so the sameescapesemantics apply). (docs.sqlalchemy.org)- The rename:
not_like()is the newer name;notlike()remains for backwards compatibility. (docs.sqlalchemy.org)
What escape= does not do
LIKE/NOT LIKEdo not have anautoescapefeature; SQLAlchemy will not scan yourpatternand insert escape characters for%/_automatically.- If you want SQLAlchemy to also transform a literal value by escaping
%,_, and the escape character itself, use the convenience string operators like.contains(..., autoescape=True, escape=...),.startswith(..., autoescape=True, escape=...),.endswith(..., autoescape=True, escape=...), etc. (docs.sqlalchemy.org)
Where it’s documented
ColumnElement.like(other, escape=None)documentsescapeas an “optional escape character” that “renders theESCAPEkeyword”. (docs.sqlalchemy.org)ColumnElement.not_like(other, escape=None)/notlike(other, escape=None)are documented under Column Elements and Expressions;not_like()is described asNOT LIKEand equivalent to~x.like(y), with the rename note. (docs.sqlalchemy.org)
Citations:
- 1: https://docs.sqlalchemy.org/20/core/sqlelement.html
- 2: https://docs.sqlalchemy.org/20/core/sqlelement.html
- 3: https://docs.sqlalchemy.org/20/core/sqlelement.html
- 4: https://docs.sqlalchemy.org/20/core/sqlelement.html
- 5: https://docs.sqlalchemy.org/20/core/sqlelement.html
- 6: https://docs.sqlalchemy.org/20/core/sqlelement.html
Add escape parameter to not contains to match other branches.
The notlike() call on line 1207 is missing the escape="\\" parameter present in all other filter branches ("contains" on line 1203, "start with" on line 1211, "end with" on line 1215). SQLAlchemy's notlike() accepts the same escape parameter as like(), and without it, the database won't recognize the backslash as an escape character. This breaks consistent handling of literal %, _, and \ characters across filter types.
✅ Proposed fix
- filters.append(json_field.notlike(f"%{escaped_value}%"))
+ filters.append(json_field.notlike(f"%{escaped_value}%", escape="\\"))🤖 Prompt for AI Agents
In `@api/core/rag/retrieval/dataset_retrieval.py` around lines 1198 - 1208, The
"not contains" branch is missing the escape arg so backslashes aren't treated
consistently; in the match branch for "not contains" where escape_like_pattern
is used and filters.append(json_field.notlike(...)) is called, add the
escape="\\\\" parameter to the notlike() invocation (matching the other branches
that pass escape="\\") so literal %, _, and \ are handled the same way.
Benchmark PR from agentic-review-benchmarks#12
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.