Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FT.SEARCH command gives me error: ERR: Query Syntax Error (this issue does work with Redis) #3258

Closed
wernermorgenstern opened this issue Jul 2, 2024 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@wernermorgenstern
Copy link

Describe the bug
I create an index (see below), and I am trying to run this query:
ft.search cmd-idx "@tid:{abc} @path:{myCmd} @status:{waiting} @updated:[1719951363 +inf]"

Index is:

        'cmd-idx',
        'ON',
        'HASH',
        'PREFIX',
        1,
        'cmd:',
        'SCHEMA',
        'tid',
        'TAG',
        'code',
        'NUMERIC',
        'msgId',
        'NUMERIC',
        'path',
        'TAG',
        'status',
        'TAG',
        'tags',
        'TAG',
        'created',
        'NUMERIC',
        'SORTABLE',
        'updated',
        'NUMERIC',
        'SORTABLE',
        'sent',
        'NUMERIC',
        'SORTABLE'
      ]

To Reproduce
Steps to reproduce the behavior:

  1. Create index with FT.create
  2. Query records using ft.search cmd-idx "@tid:{abc} @path:{myCmd} @status:{waiting} @updated:[1719951363 +inf]"
  3. Error: ERR Query syntax error

Expected behavior
Supposed to return records (0 or more)

Environment (please complete the following information):

  • OS: [macOs M1, in docker]
  • Kernel: # Command: Darwin local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64
  • Containerized?: [Docker Compose
  • Dragonfly Version: dragonfly v1.19.2-2ff628203925b206c4a1031aa24916523dc5382e
    build time: 2024-07-02 06:22:16

Additional context
This works with Redis

@wernermorgenstern wernermorgenstern added the bug Something isn't working label Jul 2, 2024
@wernermorgenstern
Copy link
Author

So the command I specified will work

However, if my search for the tid contains a dash/hyphen, that is when the error occurs

ft.search cmd-idx "@tid:{abc-1} @path:{myCmd} @status:{waiting} @Updated:[1719951363 +inf]"

@dranikpg
Copy link
Contributor

dranikpg commented Jul 4, 2024

Yes, our current parser isn't that advanced to handle all the edge cases, #3145

Meanwhile, you can use quoted 'text-with-hypens and even --spaces--' - strings with quotes ' ' or " " 🙂

@romange
Copy link
Collaborator

romange commented Jul 5, 2024

@dranikpg your comment is not very clear, seems that the formatting is broken. can you please fix and provide concrete examples on how it should look like?

@dranikpg
Copy link
Contributor

dranikpg commented Jul 5, 2024

it's not broken, I just quoted the text in a weird way 😅 It should just have quotes around the terms, like this

ft.search cmd-idx "@tag{'tag with spaces'}"

@wernermorgenstern
Copy link
Author

@dranikpg @romange , so just wondering, why does it work with Redis (unquoted)? Not dragonfly?
Is that a undocumented feature in Redis?

@BorysTheDev
Copy link
Contributor

@wernermorgenstern It looks like a bug @dranikpg has suggested a workaround for you while I'm fixing this bug

@BorysTheDev
Copy link
Contributor

BorysTheDev commented Jul 6, 2024

@wernermorgenstern I can not reproduce your case on the main branch and 1.19.2 on Linux and got the correct result.

127.0.0.1:6379> FT.CREATE cmd-idx ON HASH PREFIX 1 cmd: SCHEMA tid TAG code NUMERIC msgId NUMERIC path TAG status TAG tags TAG created NUMERIC SORTABLE updated NUMERIC SORTABLE sent NUMERIC SORTABLE
OK
127.0.0.1:6379> ft.search cmd-idx "@tid:{abc} @path:{myCmd} @status:{waiting} @updated:[1719951363 +inf]"
1) (integer) 0

I will ask somebody to check it on MacOs. I would appreciate if you check it one more time, or check that I run exactly the same commands as you do

@wernermorgenstern
Copy link
Author

wernermorgenstern commented Jul 6, 2024 via email

@BorysTheDev
Copy link
Contributor

@wernermorgenstern Thx. I missed your comment. It's reproducible now.

@BorysTheDev
Copy link
Contributor

BorysTheDev commented Jul 9, 2024

@wernermorgenstern I haven't found any correct behavior in Redis for abc-123 search if we don't use quotes (with quotes it works for DIalect >=2). So redis always returns 0 for this query.
Could you clarify what behavior you expect to see, if you want to find the exact tag you can use the next syntax '@tid:{"abc-1"}'. If I miss something please give me the link to documentation or other resource.

@CodeToFreedom
Copy link

@BorysTheDev
I have the same problem which currently also causes our migration tests to fail.

The problem:

This works in redis (but fails in dragonflydb):
"FT.SEARCH" ":Language:index" "(@code:{ZH\\-CN})" "LIMIT" "0" "1"

This works in dragonflydb (but fails in redis):
"FT.SEARCH" ":Language:index" "(@code:{'ZH-CN'})" "LIMIT" "0" "1"

Here is the fix:

To fix that and make dragonflydb behave like the redis tokenizer please escape the following regex matches with two backslashes.
DEFAULT_ESCAPED_CHARS = r"[,.<>{}\[\]\\\"\':;!@#$%^&*()\-+=~\/ ]"

The matching regex groups need to be ESCAPED like that:
f"\\{value}"

References:
Here is a completely working token escaper from a redis client library:
Source: https://github.com/redis/redis-om-python/blob/main/aredis_om/model/token_escaper.py

Many thanks for your efforts.

@BorysTheDev
Copy link
Contributor

Hi @CodeToFreedom thanks for your comment I will check it.

@CodeToFreedom
Copy link

Hi @BorysTheDev,
thanks a lot. I really appreciate your quick response and efforts. In case you have further questions on the tokenizer, please let me know and I try to help.
Btw it's an awesome library and it still feels like a kid in a play store to read all the amazing features in the docs ;).

@BorysTheDev
Copy link
Contributor

BorysTheDev commented Aug 22, 2024

Hi @CodeToFreedom I've played with Redis and I got a lot of strange results:
for example

ft.search i1 @color:{blue\-1} works
ft.search i1 @color:{blue\+1} doesn't work
ft.search i1 @color:{blue,} works but shouldn't
ft.search i1 @color:{blue\,} doesn't work

I have used redis-cli for testing
Do you have any ideas about what I do wrong or how to get the correct behaviour from Redis?

@CodeToFreedom
Copy link

CodeToFreedom commented Aug 23, 2024

Hi @BorysTheDev ,
let's consider this example script. You can run it f.e. in redis-insight workbench and it produces the correct and expected results in redis:
You can copy and run these test commands in redis and dragonflydb and see the difference.

dragonflydb-testscript.txt

dragonflydb-2024-08-23_14.31.57.mp4
redis-2024-08-23_15.19.19.mp4

Creating the index and inserting the data works both in redis and df:

FT.CREATE idx:testindex ON JSON PREFIX 1 testindex: SCHEMA $.pk AS pk TAG $.color AS color TAG

JSON.SET testindex:abc1 . "{\"pk\":\"abc1\",\"color\":\"blue-1\"}" #works in df + redis
JSON.SET testindex:abc2 . "{\"pk\":\"abc2\",\"color\":\"blue+1\"}" #works in df + redis

FT.SEARCH works only in redis with these queries. Dragonflydb results in QuerySyntaxError exceptions.

#all of these queries work as expected in redis:
FT.SEARCH idx:testindex "(@color:{blue\\-1})"  #redis: no error, successful match
FT.SEARCH idx:testindex "(@color:{blue\\+1})"  #redis: no error, successful match 
FT.SEARCH idx:testindex (@color:{blue\-1})     #redis: no error, successful match 
FT.SEARCH idx:testindex (@color:{blue\\-1})    #redis: no error, no match (which is expected as "\\" escaping is used only if quotation mark is used around the filter argument
#Your examples also work in redis as expected (I tested it with redis enterprise):
ft.search idx:testindex @color:{blue\-1} #redis: no error, successful match
ft.search idx:testindex @color:{blue\+1} #redis: no error, successful match
ft.search idx:testindex @color:{blue,} #redis: no error, but zero matches which is correct
ft.search idx:testindex @color:{blue\,} #redis: no error, but zero matches which is correct

Important to note is that usually (at least the client libraries I know of) escape with quotation mark around the argument and then use double backslash for quotation.
So this is the main syntax that needs to be supported by dragonflydb. As it is usually not only one filter the most common use case is having it enclosed in brackets.

Example (correct syntax):
FT.SEARCH idx:testindex "(@color:{blue\\-1})"
FT.SEARCH idx:testindex "@color:{blue\\-1}"

Also correct syntax (but without using the quotation mark, therefore single backslash):
FT.SEARCH idx:testindex (@color:{blue\-1})
FT.SEARCH idx:testindex @color:{blue\-1}

Please let me know whether helps clarifying the problem a bit or what exactly is unclear. I will try to help then.

With the example script above we now have a common test case that will hopefully make it easier to compare the functionality.

@CodeToFreedom
Copy link

CodeToFreedom commented Aug 23, 2024

@BorysTheDev
PS: In addition to what I wrote in my post a couple of hours ago, please also read the following redis docs:
https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/escaping/#the-rules-of-text-field-tokenization

and here:
https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/query_syntax/#tag-filters

Btw have you tried using the exact same regex that I suggested above as tokenizer from the redis client library?

To make the redis tokenizer behave exactly like the redis client library please use the following regex that escapes the following characters with two backslashes.
DEFAULT_ESCAPED_CHARS = r"[,.<>{}\[\]\\\"\':;!@#$%^&*()\-+=~\/ ]"
The matching regex groups need to be ESCAPED (or reverted) like that:
f"\\{value}"

In case the FT.SEARCH query doesn't use quotation marks for the filter argument, using one backslash is also a valid syntax.

two \-signs: valid and used most often by client libraries
f"\\{value}"

single -signs: also valid
f\{value}

So extracting the value from the query to use internally for the matching is the inverse of that.
Step 1 a) If argument is in quotation marks -> replace "\\" signs with "" to get the value
Step 1 b) If argument is not using quotation marks -> replace "\" with "" to get the value

Step 2) In case dragonflydb works with escaping special char strings differently f.e. by using quotation marks around the whole query value (as you state here #3258 (comment)), an easy internal fix would be the following:
-Take the value from step 1 and add quotation marks around it.
df_escaped_value= f"{value}"

Then you can just use the existing code in dragonfly in a redis-syntax compatible way and don't have to touch the internal dragonflydb parser.

@BorysTheDev
Copy link
Contributor

@CodeToFreedom Thank you for the excellent description of the problem, I will return to it on Monday maybe my problem was that I tried to use this functionality via redis-cli application. Anycase I will check all this functionality via redis-insight this time and if everything works OK the patch will be ready in a couple of days.

Thank you for the documentation links.

@BorysTheDev
Copy link
Contributor

@CodeToFreedom I've checked your examples and they work. I don't know the reason, but I have tested with HASH instead of JSON and for HASH it still doesn't work. Anycase I will update our parser to support this functionality even for HASH

@CodeToFreedom
Copy link

@BorysTheDev Great and thanks for the fix. I will directly test it once it's available in the upcoming release.

@CodeToFreedom
Copy link

@BorysTheDev After testing it seems to work as expected now. Many thanks for your efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants