Skip to content

Rethink syntax extension #998

@wangkuiyi

Description

@wangkuiyi

During our efforts to deploy SQLFlow in some real cases, we tried to extend the syntax of some SQL dialects, including MySQL and Hive. In these efforts, we learned an art:

Try reusing existing SQL reserved words in the extended syntax.

Why this art? Let us look at an example in #473. Because TRAIN isn't a reserved word, users could name a field by TRAIN, and this might confuse our parser. The solution is to either replace TRAIN by a reserved word, where we couldn't find one that expresses the meaning of "train", or, extend TRAIN into TO TRAIN, where, please be aware that TO is a reserved word.

It seems that other clause extensions after TO TRAIN, TO PREDICT, and TO EXPLAIN can use arbitrary words, as they are not going to be parsed by a SQL (dialect) parser, but the SQLFlow parser. However, it is not that simple. Consider that a table might have a field named label, and this field happens to be a label when we train a model. The SQL statement would look like

SELECT a, b, label FROM tbl
TO TRAIN Model
LABEL label

This would confuse the parser. Our current workaround is this. However, a complete solutions seems that we don't use OUTPUT to replace LABEL. OUTPUT is a SQL keyword, so users are not supposed to name a field by OUTPUT. (This might not be true always, but, at least, it seems that the probability of OUTPUT output is smaller than that of LABEL label.)

Please vote for the following syntax changes:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions