-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug with braces and brackets in character ranges in PythonRegex #19
Comments
Thank you for your feedback. There were three different issues:
Thank you again for your help in finding bug :) I added your examples to the tests, and they should be working now. A new version was pushed on Pypi. |
Thanks for the quick fix! However, I think ranges where the ending character is escaped still do not work correctly. (Also, it looks like you added the test |
Thanks again. Ultimately, I had to work harder to understand what Python is actually doing and how escaping works (https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences). I am not quite sure I mimic exactly the behavior of Python re in all edge cases, but it should be closer now. I discovered weird things. For example, the three following regexes seem to have the same behavior: r = re.compile("\\x10")
r = re.compile("\x10")
r = re.compile(r"\x10") To recognize the string \x10 (not hexadecimal equivalent), we have to escape twice: r = re.compile("\\\\x10") At this point, I am not sure anybody is writing such regex... Anyway, it should work better now. A new version was pushed. Tell me if you find other hard cases! |
(and regarding your last question, |
Thanks! It looks like it works correctly now. |
I have found some new hard cases! It looks like the escaping of a dot with backslashes in front does not behave correctly: from pyformlang.regular_expression import PythonRegex
PythonRegex(r"\.").accepts(["a"]) # Returns False
PythonRegex(r"\.").accepts(["."]) # Returns True
PythonRegex(r"\\.").accepts(["\\a"]) # Returns False, should be True
PythonRegex(r"\\.").accepts(["\\."]) # Returns False, should be True
PythonRegex(r"\\\.").accepts(["\\a"]) # Returns False
PythonRegex(r"\\\.").accepts(["\\."]) # Returns False, should be True |
Thank you again! I quickly found the bug and corrected it. |
Thanks for quick response, but it doesn't seem to be fixed? When I install version 1.0.7, I get the same results in my example as before? |
In your examples, you have to remove the square brackets. Otherwise, Pyformlang understands that "\\a" is a single symbol/letter (because Pyformlang is more general in a way than Python re). |
Ah yes, I forgot about that part. Thanks, it works now! Then my examples were also incorrect as the last one did already return |
Character ranges work incorrectly when braces or brackets are used inside them. Starting with braces, the expressions
r"[{}]"
andr"[\{\}]"
work correctly, e.g.returns
True
. However, usingr"[{-}]"
raises an error andr"[\{-\}]"
works incorrectly as the expression does not accept|
.Using brackets does not really work at all. The expressions
r"[\[]"
,r"[Z-\[]"
,r"[\[-a]"
andr"[\[-\]]"
do not accept anything andr"[\]]"
,r"[Z-\]]"
andr"[\]-a]"
raise errors.The text was updated successfully, but these errors were encountered: