Skip to content

[BUG] PPL queries fail when field names contain reserved keyword "limit" #4481

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

source=kube-state-metrics | fields monitoring.metrics.beat.handles.limit.hard

Expected Result:
The query should successfully return values from the field monitoring.metrics.beat.handles.limit.hard .

Actual Result:

{
  "error": {
    "reason": "Invalid Query",
    "details": "[limit] is not a valid term at this part of the query: '...etrics.beat.handles.limit' <-- HERE. Expecting tokens: ID",
    "type": "SyntaxCheckException"
  },
  "status": 400
}

Dataset Information

Dataset/Schema Type

  • OpenTelemetry (OTEL)
  • Simple Schema for Observability (SS4O)
  • Open Cybersecurity Schema Framework (OCSF)
  • Custom (details below)

Index Mapping

{
  "mappings": {
    "properties": {
      "monitoring": {
        "properties": {
          "metrics": {
            "properties": {
              "beat": {
                "properties": {
                  "handles": {
                    "properties": {
                      "limit": {
                        "properties": {
                          "hard": { "type": "long" },
                          "soft": { "type": "long" }
                        }
                      },
                      "open": { "type": "long" }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Sample Data

{
  "monitoring": {
    "metrics": {
      "beat": {
        "handles": {
          "limit": {
            "hard": 1048576,
            "soft": 1048576
          },
          "open": 4
        }
      }
    }
  }
}

Bug Description

Issue Summary:
PPL queries fail when field names contain the reserved keyword "limit" as part of a dotted path. The parser incorrectly interprets "limit" as the LIMIT command keyword instead of as part of the field identifier, causing a syntax error.

Steps to Reproduce:

  1. Create an index with nested fields containing "limit" in the path (e.g., monitoring.metrics.beat.handles.limit.hard)
  2. Execute a PPL query: source=kube-state-metrics | fields monitoring.metrics.beat.handles.limit.hard
  3. Observe the syntax error

Workaround:
Wrap the entire field path in backticks:

source=kube-state-metrics | fields `monitoring.metrics.beat.handles.limit.hard`

Impact:

  • Users with data containing "limit" in field names cannot query these fields without using backticks
  • Error messages are not helpful in identifying that "limit" is a reserved keyword
  • Documentation does not clearly list reserved keywords or explain when backticks are required for PPL queries

Environment Information

OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT

Additional Details:
Related to issue #1589 which addressed similar problems with other reserved keywords like "type" and "ip".

Tentative Root Cause

This is a preliminary analysis and requires further investigation.

NOTE: Suggest exploring whether the same applies to other keywords as well.

The root cause is that LIMIT is defined as a reserved keyword in the PPL lexer (ppl/src/main/antlr/OpenSearchPPLLexer.g4 line 132) but is NOT included in the keywordsCanBeId list in the PPL parser (ppl/src/main/antlr/OpenSearchPPLParser.g4 lines 1383-1450).

When the parser encounters a dotted field path like monitoring.metrics.beat.handles.limit.hard, it tokenizes each segment. When it reaches "limit", the lexer recognizes it as the LIMIT keyword token (used for commands like | limit 10 or in timechart arguments). Since LIMIT is not in the keywordsCanBeId list, the parser cannot accept it as a valid identifier in field name contexts, resulting in the syntax error.

The keywordsCanBeId rule is specifically designed to allow certain keywords to also function as identifiers when they appear in contexts where identifiers are expected (like field names), while still preserving their keyword functionality in command contexts.

Tentative Proposed Fix

This is a preliminary analysis and requires further investigation.

NOTE: Suggest exploring whether the same applies to other keywords as well.

Code Fix

Add LIMIT to the keywordsCanBeId list in ppl/src/main/antlr/OpenSearchPPLParser.g4:

keywordsCanBeId
   : searchableKeyWord
   | IN
   | LIMIT  // Add this line
   ;

This approach follows the pattern established in PR #1319, which added multiple command keywords (SEARCH, DESCRIBE, SHOW, FROM, WHERE, FIELDS, etc.) to the keywordsCanBeId list to allow them to be used as identifiers.

Caution: This change requires thorough testing to ensure it doesn't break existing LIMIT command functionality, particularly in:

  • Timechart arguments (timechartArg: LIMIT EQUAL integerLiteral)
  • Any other contexts where LIMIT is used as a command keyword

Documentation Improvements

  1. Update /docs/user/ppl/general/identifiers.rst to include:

    • A complete list of PPL reserved keywords (or reference to the lexer file)
    • Clear examples showing when backticks are required
    • Specific guidance for nested field paths containing reserved keywords
  2. Add a new section titled "Reserved Keywords" with content like:

    Reserved Keywords
    =================
    
    Description
    -----------
    
    PPL has reserved keywords that are used for commands and operations. When these keywords appear in field names, they must be enclosed in backticks.
    
    Common reserved keywords include: LIMIT, WHERE, FIELDS, STATS, SORT, EVAL, HEAD, TOP, RARE, PARSE, SEARCH, and others.
    
    For a complete list, see the OpenSearchPPLLexer.g4 file.
    
    Examples
    --------
    
    Field name containing reserved keyword::
    
        os> source=metrics | fields `monitoring.metrics.beat.handles.limit.hard`;
    
    Multiple fields with reserved keywords::
    
        os> source=logs | fields `type`, `source`, timestamp;
  3. Improve error messages to detect when a reserved keyword is encountered in a field name context and suggest using backticks.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions