Skip to content

RFC: OpenSearch PPL fields Command Enhancement #3888

@aalva500-prog

Description

@aalva500-prog

Command Name: fields - Advanced Field Selection Features

1. Overview

Description

This RFC proposes enhancing the existing fields command in PPL with advanced field selection capabilities. The enhancement includes wildcard support, flexible syntax options, and creating a table command alias to provide users with more flexibility and improved usability.

Proposed Features

This RFC proposes implementing five key features for the fields command:

Feature 1: Space-delimited field syntax

  • Add support for space-separated fields in addition to comma-separated
  • Example: source=accounts | fields firstname lastname age

Feature 2: Wildcard pattern matching

  • Add support for prefix, suffix, and contains wildcards
  • Prefix: source=accounts | fields account* (matches account_number)
  • Suffix: source=accounts | fields *name (matches firstname, lastname)
  • Contains: source=accounts | fields *a* (matches all fields containing 'a')

Feature 3: Field deduplication

  • Automatically remove duplicate fields when wildcards expand to already specified fields
  • Maintains clean result sets and optimal performance

Feature 4: Mixed delimiter support

  • Support both space and comma delimiters in the same command
  • Example: source=accounts | fields firstname lastname, balance

Feature 5: Table command alias

  • Create a table command that functions identically to fields command
  • Provides alternative syntax: source=accounts | table firstname, lastname
  • All features (1-4) work identically with both fields and table syntax

Use Cases

  • Select multiple related fields using pattern matching to reduce query complexity
  • Simplify queries by avoiding lengthy field lists when working with related fields
  • Dynamically include fields that follow a naming pattern for flexible data exploration
  • Efficiently exclude groups of fields that match a pattern using exclusion syntax
  • Flexible syntax support allowing users to choose their preferred delimiter style
  • Alternative command syntax for users familiar with table-based field selection

2. Final Syntax

fields [+|-] <field-list>
table [+|-] <field-list>

Parameters

  • [+|-]: Optional prefix that specifies whether to keep (+) or remove (-) the fields. Default is + (keep).
  • <field-list>: Comma-delimited list of fields to keep or remove. With this enhancement, will support wildcards (e.g., value*, error*, user.*).

Wildcard Support

  • Asterisk (*) will match zero or more characters in field names
  • Wildcards can be used at the beginning, middle, or end of field names
  • Examples of wildcard patterns:
    • user.*: Matches all fields that start with "user."
    • *name: Matches all fields that end with "name"
    • *id*: Matches all fields that contain "id"
    • a*b: Matches all fields that start with "a" and end with "b"

3. Usage Examples

Usage Examples

# Feature 1: Space-delimited syntax
source=accounts | fields firstname lastname age
# Feature 2: Wildcard pattern matching
# Prefix wildcards
source=accounts | fields account*

# Suffix wildcards
source=accounts | fields *name

# Contains wildcards
source=accounts | fields *a*
# Feature 3: Field deduplication (automatic)
source=accounts | fields firstname, *name  # firstname not duplicated
# Feature 4: Mixed delimiter support
source=accounts | fields firstname lastname, balance
# Feature 5: Table command alias - identical functionality
source=accounts | fields account_number, firstname, lastname
source=accounts | table account_number, firstname, lastname

# All features work with both commands
source=accounts | fields firstname lastname, account*
source=accounts | table firstname lastname, account*

Best Practices

  • Use wildcards to simplify queries but be mindful of pattern specificity to avoid selecting unintended fields
  • Prefer more specific patterns (e.g., user.* instead of u*) to improve performance
  • Consider using explicit field names for critical fields and wildcards for supplementary fields
  • When using multiple wildcards, order them from most to least specific for better readability

Implementation Notes

  • Wildcard patterns are converted to regex for matching
  • Pattern matching supports prefix, suffix, and contains wildcards
  • Field results are sorted alphabetically for consistent ordering
  • Duplicate fields are automatically removed

4. Implementation Details

Technical Approach

The table command alias will be implemented at the parser level to generate identical AST structures:

  1. Add table command to the PPL parser grammar
  2. Both fields and table commands generate the same Project AST node
  3. No separate table-specific AST node or logic needed
  4. Maintain all existing fields command functionality including wildcards

Implementation Strategy

The table command will be implemented as a parser-level alias:

  1. Parser recognizes both fields and table keywords
  2. Both commands generate identical Project AST structures
  3. Same Analyzer.visitProject() method handles both commands
  4. All existing features (inclusio, exclusion, etc.) work identically

Dependencies

  • Existing fields command implementation
  • PPL parser and grammar
  • AST node structures

Testing Strategy

Unit Testing:

  • Wildcard pattern matching logic (prefix, suffix, contains, complex patterns)
  • Field deduplication algorithms
  • Syntax parsing for space-delimited and mixed delimiters
  • Cross-engine compatibility validation

Integration Testing:

  • End-to-end functionality for both fields and table commands
  • Pipeline integration with other PPL commands
  • Performance benchmarks comparing wildcard vs. explicit field selection
  • Edge case testing with special characters and unusual field names

Test Coverage Requirements:

  • All five features tested for both commands
  • Cross-engine compatibility (Calcite and non-Calcite)
  • Regression testing to ensure existing functionality remains intact
  • Performance validation to ensure no significant degradation

5. Expected Benefits

User Experience Improvements

  • Simplified Query Writing: Wildcard patterns reduce the need for lengthy field lists
  • Flexible Syntax Options: Users can choose between comma, space, or mixed delimiters
  • Consistent Command Behavior: Fields and table commands work identically
  • Reduced Query Complexity: Pattern matching eliminates repetitive field specifications

Performance Benefits

  • Optimized Field Resolution: Efficient pattern matching algorithms
  • Automatic Deduplication: Prevents unnecessary duplicate field processing
  • Maintained Field Ordering: Predictable output structure

Developer Benefits

  • Code Reusability: Shared implementation between fields and table commands
  • Maintainability: Consistent codebase with unified field resolution logic
  • Extensibility: Framework supports future enhancements and additional patterns

Backward Compatibility

  • All existing queries will continue to work without modification
  • New features are additive and do not break existing functionality
  • Graceful handling of edge cases and invalid patterns

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagecalcitecalcite migration releatedenhancementNew feature or requestv3.3.0

Type

No type

Projects

Status

New

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions