Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport: Fix LIKE with escapes #2

Merged

Conversation

findepi
Copy link
Collaborator

@findepi findepi commented Nov 8, 2024

Cherry pick apache#6703 with some adjustments for conflict resolution.

@findepi findepi changed the title Fix LIKE with escapes Backport: Fix LIKE with escapes Nov 8, 2024
@github-actions github-actions bot added the arrow label Nov 8, 2024
Fix LIKE processing for patterns containing escapes

- the starts_with / ends_with optimization did not correctly check for
  escapes when checking rest of the pattern for being literal or not
- the pattern to regexp compiler incorrectly processed \ followed by a
  character other than % or _. In PostgreSQL '\x' pattern matches single
  'x'.

There are two tests

- like_escape_many was generated using PostgreSQL with the code attached
  below for verification
- like_escape is hand-picked test cases that are more interesting.
  Lower cardinality of hand-picked test cases allows for exercising all
  scalar/array vs scalar/array combinations.

The below script isn't simples possible, because it was attempted to
generate more test cases by adding padding. Hence e.g.
is_like_without_dangling_escape.  Since this is attached for reference,
should be attached as-is.

```python
import psycopg2

data = r"""
\
\\
\\\
\\\\
a
\a
\\a
%
\%
\\%
%%
\%%
\\%%
_
\_
\\_
__
\__
\\__
abc
a_c
a\bc
a\_c
%abc
\%abc
a\\_c%
""".split('\n')

data = list(dict.fromkeys(data))

conn = psycopg2.connect(host='localhost', port=5432, user='postgres', password='mysecretpassword')
conn.set_session(autocommit=True)
cursor = conn.cursor()
for r in data:
    try:
        # PostgreSQL verifies dandling escape only sometimes
        cursor.execute(f"SELECT %s LIKE %s", (r, r))
        is_like, = cursor.fetchone()
        has_dandling_escape = False
        pg_pattern = r
    except Exception as e:
        if 'LIKE pattern must not end with escape character' not in str(e):
            raise e
        has_dandling_escape = True
        pg_pattern = r + '\\'

    for l in data:
        # print()
        # print('     '.join(str(v) for v in (l, r, has_dandling_escape, postgres_pattern)))
        cursor.execute(f"SELECT %s LIKE %s", (l, pg_pattern))
        is_like, = cursor.fetchone()
        assert type(is_like) is bool

        if not is_like and has_dandling_escape:
            pattern_without_escaped_dandling_escape = pg_pattern[:-2]
            cursor.execute(f"SELECT %s LIKE %s", (l, pattern_without_escaped_dandling_escape))
            is_like_without_dangling_escape, = cursor.fetchone()
            assert type(is_like_without_dangling_escape) is bool
        else:
            is_like_without_dangling_escape = False
        assert '"' not in l
        assert '"' not in r
        print('(r"%s", r"%s", %s),' % (
            l, r,
            str(is_like).lower(),
            # str(has_dandling_escape).lower(),
            # str(is_like_without_dangling_escape).lower(),
        ))
```
@findepi findepi force-pushed the findepi/sdf/52.2.0/fix-like-with-escapes-b71b43 branch from 03dea80 to 02d401b Compare November 8, 2024 14:50
@findepi findepi merged commit 692d9f8 into sdf/52.2.0 Nov 9, 2024
17 checks passed
@findepi findepi deleted the findepi/sdf/52.2.0/fix-like-with-escapes-b71b43 branch November 9, 2024 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants