Query fails with mismatched input 'X' expecting {<EOF>, '&', '|'} #400

lnicolas · 2015-04-22T13:51:11Z

Hello,

we are using Annis for a German Learner corpus with some attributes having German characters.
Unfortunately, when there is a "special" (non [a-z] I believe) in the attribute searched, the query cannot be performed.

Thanks for the great work.
Regards,

Lionel

thomaskrause · 2015-04-27T13:21:38Z

Hi, sorry for the delay I was in holiday and could not check my mails.

in general Umlaut and other special characters should work fine (e.g. https://korpling.german.hu-berlin.de/annis3/#_q=bGVtbWE9ImRhZsO8ciI&_c=cGNjMg&cl=5&cr=5&s=0&l=10 or even https://korpling.german.hu-berlin.de/annis3/scriptorium#_q=bm9ybT0i4rKb4rKf4rKp4rKn4rKJIg&_c=YWJyYWhhbS5vdXIuZmF0aGVy&cl=5&cr=5&s=0&l=10&_seg=d29yZA ) but on some configurations of Tomcat there might be problems:
http://korpling.github.io/ANNIS/doc/admin-configure-webapp.html#admin-configure-tomcat-utf8

Since we use a web-service and sometimes this web-service might be behind a proxy web-server there are also possibilities were the URLs can be mixed up. If you are running the backend web service or the frontend web application behind a proxy please send me more details about your configuration.

Best,

Thomas

lnicolas · 2015-04-27T13:42:22Z

Hello Thomas,

thanks for the examples and no issue at all for the delay, I'm already glad you take the time to answer!
Proxi server should not be the problem as I can reproduce it on publicly available Annis instances.

I could notice that the bug happens when the special caracter is within the attribute name but not at the start or at the end.

Actually, when it is at the start or at the end, the bug is different: the "special" caracters are ignored.
=> https://korpling.german.hu-berlin.de/annis3/#_q=w7xsZW1tYcO8PSJkYWbDvHIi&_c=cGNjMg&cl=5&cr=5&s=0&l=10

When it is in the middle then the query fails with "Query fails with mismatched input 'X' expecting {, '&', '|'} #400 "
=> https://korpling.german.hu-berlin.de/annis3/#_q=bGVtw7xtYT0iZGFmw7xyIg&_c=cGNjMg&cl=5&cr=5&s=0&l=10

Regards

Lionel

thomaskrause · 2015-06-02T07:22:01Z

We discussed this internally and currently we don't plan to introduce support for non ASCII-character for the attribute names (of course we still support them for the values).

Allowing all characters for the names might introduce some tricky problems for parsing, E.g. if a user uses the quotation mark ” (U+201D) instead of the proper " (U+0022) in the query. There would be much more corner cases than now and just renaming the annotation names seems to be easier than to get into that hurdle. I will also make sure that the new version of the ANNIS import format converter in Pepper will handle this gracefully.

However I updated the parser and the error messages should now be consistent. So an e.g. umlaut before or after the annotation name are now recognized as lexer errors instead of being silently ignored. Also the error message now explicitly states that the token could not be recognized.

https://korpling.german.hu-berlin.de/annis3-snapshot/#_q=w7xsZW1tYcO8PSJkYWbDvHIi&_c=cGNjMg&cl=5&cr=5&s=0&l=10
https://korpling.german.hu-berlin.de/annis3-snapshot/#_q=bGVtw7xtYT0iZGFmw7xyIg&_c=cGNjMg&cl=5&cr=5&s=0&l=10

lnicolas · 2015-06-02T08:13:59Z

Ciao Thomas,

after reading your explanations, I completely agree with your conclusion.
Thanks for having taken the time to consider the question.
Regards,

Lionel

thomaskrause mentioned this issue Jun 2, 2015

Umlaut at beginning or end of annotation name doesn't trigger a parser error #412

Closed

thomaskrause closed this as completed Oct 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query fails with mismatched input 'X' expecting {<EOF>, '&', '|'} #400

Query fails with mismatched input 'X' expecting {<EOF>, '&', '|'} #400

lnicolas commented Apr 22, 2015

thomaskrause commented Apr 27, 2015

lnicolas commented Apr 27, 2015

thomaskrause commented Jun 2, 2015

lnicolas commented Jun 2, 2015

Query fails with mismatched input 'X' expecting {<EOF>, '&', '|'} #400

Query fails with mismatched input 'X' expecting {<EOF>, '&', '|'} #400

Comments

lnicolas commented Apr 22, 2015

thomaskrause commented Apr 27, 2015

lnicolas commented Apr 27, 2015

thomaskrause commented Jun 2, 2015

lnicolas commented Jun 2, 2015