-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query fails with mismatched input 'X' expecting {<EOF>, '&', '|'} #400
Comments
Hi, sorry for the delay I was in holiday and could not check my mails. in general Umlaut and other special characters should work fine (e.g. https://korpling.german.hu-berlin.de/annis3/#_q=bGVtbWE9ImRhZsO8ciI&_c=cGNjMg&cl=5&cr=5&s=0&l=10 or even https://korpling.german.hu-berlin.de/annis3/scriptorium#_q=bm9ybT0i4rKb4rKf4rKp4rKn4rKJIg&_c=YWJyYWhhbS5vdXIuZmF0aGVy&cl=5&cr=5&s=0&l=10&_seg=d29yZA ) but on some configurations of Tomcat there might be problems: Since we use a web-service and sometimes this web-service might be behind a proxy web-server there are also possibilities were the URLs can be mixed up. If you are running the backend web service or the frontend web application behind a proxy please send me more details about your configuration. Best, Thomas |
Hello Thomas, thanks for the examples and no issue at all for the delay, I'm already glad you take the time to answer! I could notice that the bug happens when the special caracter is within the attribute name but not at the start or at the end. Actually, when it is at the start or at the end, the bug is different: the "special" caracters are ignored. When it is in the middle then the query fails with "Query fails with mismatched input 'X' expecting {, '&', '|'} #400 " Regards Lionel |
We discussed this internally and currently we don't plan to introduce support for non ASCII-character for the attribute names (of course we still support them for the values). Allowing all characters for the names might introduce some tricky problems for parsing, E.g. if a user uses the quotation mark ” (U+201D) instead of the proper " (U+0022) in the query. There would be much more corner cases than now and just renaming the annotation names seems to be easier than to get into that hurdle. I will also make sure that the new version of the ANNIS import format converter in Pepper will handle this gracefully. However I updated the parser and the error messages should now be consistent. So an e.g. umlaut before or after the annotation name are now recognized as lexer errors instead of being silently ignored. Also the error message now explicitly states that the token could not be recognized. https://korpling.german.hu-berlin.de/annis3-snapshot/#_q=w7xsZW1tYcO8PSJkYWbDvHIi&_c=cGNjMg&cl=5&cr=5&s=0&l=10 |
Ciao Thomas, after reading your explanations, I completely agree with your conclusion. Lionel |
Hello,
we are using Annis for a German Learner corpus with some attributes having German characters.
Unfortunately, when there is a "special" (non [a-z] I believe) in the attribute searched, the query cannot be performed.
Thanks for the great work.
Regards,
Lionel
The text was updated successfully, but these errors were encountered: