-
Notifications
You must be signed in to change notification settings - Fork 705
Description
During our efforts to deploy SQLFlow in some real cases, we tried to extend the syntax of some SQL dialects, including MySQL and Hive. In these efforts, we learned an art:
Try reusing existing SQL reserved words in the extended syntax.
Why this art? Let us look at an example in #473. Because TRAIN isn't a reserved word, users could name a field by TRAIN, and this might confuse our parser. The solution is to either replace TRAIN by a reserved word, where we couldn't find one that expresses the meaning of "train", or, extend TRAIN into TO TRAIN, where, please be aware that TO is a reserved word.
It seems that other clause extensions after TO TRAIN, TO PREDICT, and TO EXPLAIN can use arbitrary words, as they are not going to be parsed by a SQL (dialect) parser, but the SQLFlow parser. However, it is not that simple. Consider that a table might have a field named label, and this field happens to be a label when we train a model. The SQL statement would look like
SELECT a, b, label FROM tbl
TO TRAIN Model
LABEL labelThis would confuse the parser. Our current workaround is this. However, a complete solutions seems that we don't use OUTPUT to replace LABEL. OUTPUT is a SQL keyword, so users are not supposed to name a field by OUTPUT. (This might not be true always, but, at least, it seems that the probability of OUTPUT output is smaller than that of LABEL label.)
Please vote for the following syntax changes: