Skip to content

Conversation

@Swiddis
Copy link
Collaborator

@Swiddis Swiddis commented Jan 29, 2025

Description

While working on fixing PPL AST bugs (#3273), it stood out to me that our error reporting for syntax errors really isn't that good. This PR cleans up the handling of our errors.

Example query that's currently returning syntax errors:

POST _plugins/_ppl
{
  "query": "SOURCE = test_19a673e2 | WHERE x OR y"
}

Before:

{
  "error": {
    "reason": "Invalid Query",
    "details": "Failed to parse query due to offending symbol [OR] at: 'SOURCE = test_19a673e2 | WHERE x OR' <--- HERE... More details: Expecting tokens in {'SEARCH', 'DESCRIBE', 'SHOW', 'FROM', 'WHERE', 'FIELDS', 'RENAME', 'STATS', 'DEDUP', 'SORT', 'EVAL', 'HEAD', 'TOP', 'RARE', 'PARSE', 'METHOD', 'REGEX', 'PUNCT', 'GROK', 'PATTERN', 'PATTERNS', 'NEW_FIELD', 'KMEANS', 'AD', 'ML', 'FILLNULL', 'TRENDLINE', 'SOURCE', 'INDEX', 'D', 'DESC', 'DATASOURCES', 'SORTBY', 'AUTO', 'STR', 'NUM', 'KEEPEMPTY', 'CONSECUTIVE', 'DEDUP_SPLITVALUES', 'PARTITIONS', 'ALLNUM', 'DELIM', 'CENTROIDS', 'ITERATIONS', 'DISTANCE_TYPE', 'NUMBER_OF_TREES', 'SHINGLE_SIZE', 'SAMPLE_SIZE', 'OUTPUT_AFTER', 'TIME_DECAY', 'ANOMALY_RATE', 'CATEGORY_FIELD', 'TIME_FIELD', 'TIME_ZONE', 'TRAINING_DATA_SIZE', 'ANOMALY_SCORE_THRESHOLD', 'TRUE', 'FALSE', 'CONVERT_TZ', 'DATETIME', 'DAY', 'DAY_HOUR', 'DAY_MICROSECOND', 'DAY_MINUTE', 'DAY_OF_YEAR', 'DAY_SECOND', 'HOUR', 'HOUR_MICROSECOND', 'HOUR_MINUTE', 'HOUR_OF_DAY', 'HOUR_SECOND', 'INTERVAL', 'MICROSECOND', 'MILLISECOND', 'MINUTE', 'MINUTE_MICROSECOND', 'MINUTE_OF_DAY', 'MINUTE_OF_HOUR', 'MINUTE_SECOND', 'MONTH', 'MONTH_OF_YEAR', 'QUARTER', 'SECOND', 'SECOND_MICROSECOND', 'SECOND_OF_MINUTE', 'WEEK', 'WEEK_OF_YEAR', 'YEAR', 'YEAR_MONTH', 'IP', '.', '+', '-', '(', '`', 'AVG', 'COUNT', 'DISTINCT_COUNT', 'ESTDC', 'ESTDC_ERROR', 'MAX', 'MEAN', 'MEDIAN', 'MIN', 'MODE', 'RANGE', 'STDEV', 'STDEVP', 'SUM', 'SUMSQ', 'VAR_SAMP', 'VAR_POP', 'STDDEV_SAMP', 'STDDEV_POP', 'PERCENTILE', 'TAKE', 'FIRST', 'LAST', 'LIST', 'VALUES', 'EARLIEST', 'EARLIEST_TIME', 'LATEST', 'LATEST_TIME', 'PER_DAY', 'PER_HOUR', 'PER_MINUTE', 'PER_SECOND', 'RATE', 'SPARKLINE', 'C', 'DC', 'ABS', 'CBRT', 'CEIL', 'CEILING', 'CONV', 'CRC32', 'E', 'EXP', 'FLOOR', 'LN', 'LOG', 'LOG10', 'LOG2', 'MOD', 'PI', 'POSITION', 'POW', 'POWER', 'RAND', 'ROUND', 'SIGN', 'SQRT', 'TRUNCATE', 'ACOS', 'ASIN', 'ATAN', 'ATAN2', 'COS', 'COT', 'DEGREES', 'RADIANS', 'SIN', 'TAN', 'ADDDATE', 'ADDTIME', 'CURDATE', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURTIME', 'DATE', 'DATEDIFF', 'DATE_ADD', 'DATE_FORMAT', 'DATE_SUB', 'DAYNAME', 'DAYOFMONTH', 'DAYOFWEEK', 'DAYOFYEAR', 'DAY_OF_MONTH', 'DAY_OF_WEEK', 'EXTRACT', 'FROM_DAYS', 'FROM_UNIXTIME', 'GET_FORMAT', 'LAST_DAY', 'LOCALTIME', 'LOCALTIMESTAMP', 'MAKEDATE', 'MAKETIME', 'MONTHNAME', 'NOW', 'PERIOD_ADD', 'PERIOD_DIFF', 'SEC_TO_TIME', 'STR_TO_DATE', 'SUBDATE', 'SUBTIME', 'SYSDATE', 'TIME', 'TIMEDIFF', 'TIMESTAMP', 'TIMESTAMPADD', 'TIMESTAMPDIFF', 'TIME_FORMAT', 'TIME_TO_SEC', 'TO_DAYS', 'TO_SECONDS', 'UNIX_TIMESTAMP', 'UTC_DATE', 'UTC_TIME', 'UTC_TIMESTAMP', 'WEEKDAY', 'YEARWEEK', 'SUBSTR', 'SUBSTRING', 'LTRIM', 'RTRIM', 'TRIM', 'LOWER', 'UPPER', 'CONCAT', 'CONCAT_WS', 'LENGTH', 'STRCMP', 'RIGHT', 'LEFT', 'ASCII', 'LOCATE', 'REPLACE', 'REVERSE', 'CAST', 'LIKE', 'ISNULL', 'ISNOTNULL', 'CIDRMATCH', 'IFNULL', 'NULLIF', 'IF', 'TYPEOF', 'ALLOW_LEADING_WILDCARD', 'ANALYZE_WILDCARD', 'ANALYZER', 'AUTO_GENERATE_SYNONYMS_PHRASE_QUERY', 'BOOST', 'CUTOFF_FREQUENCY', 'DEFAULT_FIELD', 'DEFAULT_OPERATOR', 'ENABLE_POSITION_INCREMENTS', 'ESCAPE', 'FLAGS', 'FUZZY_MAX_EXPANSIONS', 'FUZZY_PREFIX_LENGTH', 'FUZZY_TRANSPOSITIONS', 'FUZZY_REWRITE', 'FUZZINESS', 'LENIENT', 'LOW_FREQ_OPERATOR', 'MAX_DETERMINIZED_STATES', 'MAX_EXPANSIONS', 'MINIMUM_SHOULD_MATCH', 'OPERATOR', 'PHRASE_SLOP', 'PREFIX_LENGTH', 'QUOTE_ANALYZER', 'QUOTE_FIELD_SUFFIX', 'REWRITE', 'SLOP', 'TIE_BREAKER', 'TYPE', 'ZERO_TERMS_QUERY', 'SPAN', 'MS', 'S', 'M', 'H', 'W', 'Q', 'Y', ID, INTEGER_LITERAL, DECIMAL_LITERAL, DQUOTA_STRING, SQUOTA_STRING, BQUOTA_STRING}",
    "type": "SyntaxCheckException"
  },
  "status": 400
}

After:

{
  "error": {
    "reason": "Invalid Query",
    "details": "[OR] is not a valid term at this part of the query: '..._19a673e2 | WHERE x OR' <-- HERE. Expecting one of 284 possible tokens. Some examples: 'SEARCH', 'DESCRIBE', 'SHOW', 'FROM', 'WHERE', ...",
    "type": "SyntaxCheckException"
  },
  "status": 400
}

Even though this particular query should be valid, I think it's much clearer from this what the parser is mad about, and easier to put in a bug report too.

Related Issues

N/A

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis force-pushed the feature/clean-up-syntax-errors branch from 31649f1 to a47be17 Compare January 29, 2025 22:47
POST /_plugins/_ppl
{
"query" : "SELECT * FROM sample:data"
"query" : "SOURCE = test_index | where a > 0)"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SQL error no longer outputs the same error message (new parsing engine?). I couldn't hit the ANTLR exception with a new SQL query, so I updated it to a PPL one.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the improvement of this PR is truncate the long error message? If so, is it possible to simplify the changes especially line 60-80?

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis
Copy link
Collaborator Author

Swiddis commented Feb 4, 2025

So the improvement of this PR is truncate the long error message? If so, is it possible to simplify the changes especially line 60-80?

Should be better now, refactored it to only do the mapping for the first few tokens so we can just use String.join everywhere

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis force-pushed the feature/clean-up-syntax-errors branch from f09a985 to 7fcb50e Compare February 21, 2025 21:30
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

@Swiddis Swiddis merged commit 3bab06d into opensearch-project:main Feb 25, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants