generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 181
Closed
Labels
PPLPiped processing languagePiped processing languagecalcitecalcite migration releatedcalcite migration releatedenhancementNew feature or requestNew feature or requestv3.3.0
Description
Command Name: fields - Advanced Field Selection Features
1. Overview
Description
This RFC proposes enhancing the existing fields command in PPL with advanced field selection capabilities. The enhancement includes wildcard support, flexible syntax options, and creating a table command alias to provide users with more flexibility and improved usability.
Proposed Features
This RFC proposes implementing five key features for the fields command:
Feature 1: Space-delimited field syntax
- Add support for space-separated fields in addition to comma-separated
- Example:
source=accounts | fields firstname lastname age
Feature 2: Wildcard pattern matching
- Add support for prefix, suffix, and contains wildcards
- Prefix:
source=accounts | fields account*(matches account_number) - Suffix:
source=accounts | fields *name(matches firstname, lastname) - Contains:
source=accounts | fields *a*(matches all fields containing 'a')
Feature 3: Field deduplication
- Automatically remove duplicate fields when wildcards expand to already specified fields
- Maintains clean result sets and optimal performance
Feature 4: Mixed delimiter support
- Support both space and comma delimiters in the same command
- Example:
source=accounts | fields firstname lastname, balance
Feature 5: Table command alias
- Create a
tablecommand that functions identically tofieldscommand - Provides alternative syntax:
source=accounts | table firstname, lastname - All features (1-4) work identically with both
fieldsandtablesyntax
Use Cases
- Select multiple related fields using pattern matching to reduce query complexity
- Simplify queries by avoiding lengthy field lists when working with related fields
- Dynamically include fields that follow a naming pattern for flexible data exploration
- Efficiently exclude groups of fields that match a pattern using exclusion syntax
- Flexible syntax support allowing users to choose their preferred delimiter style
- Alternative command syntax for users familiar with table-based field selection
2. Final Syntax
fields [+|-] <field-list>
table [+|-] <field-list>
Parameters
[+|-]: Optional prefix that specifies whether to keep (+) or remove (-) the fields. Default is + (keep).<field-list>: Comma-delimited list of fields to keep or remove. With this enhancement, will support wildcards (e.g., value*, error*, user.*).
Wildcard Support
- Asterisk (*) will match zero or more characters in field names
- Wildcards can be used at the beginning, middle, or end of field names
- Examples of wildcard patterns:
user.*: Matches all fields that start with "user."*name: Matches all fields that end with "name"*id*: Matches all fields that contain "id"a*b: Matches all fields that start with "a" and end with "b"
3. Usage Examples
Usage Examples
# Feature 1: Space-delimited syntax
source=accounts | fields firstname lastname age
# Feature 2: Wildcard pattern matching
# Prefix wildcards
source=accounts | fields account*
# Suffix wildcards
source=accounts | fields *name
# Contains wildcards
source=accounts | fields *a*
# Feature 3: Field deduplication (automatic)
source=accounts | fields firstname, *name # firstname not duplicated
# Feature 4: Mixed delimiter support
source=accounts | fields firstname lastname, balance
# Feature 5: Table command alias - identical functionality
source=accounts | fields account_number, firstname, lastname
source=accounts | table account_number, firstname, lastname
# All features work with both commands
source=accounts | fields firstname lastname, account*
source=accounts | table firstname lastname, account*
Best Practices
- Use wildcards to simplify queries but be mindful of pattern specificity to avoid selecting unintended fields
- Prefer more specific patterns (e.g.,
user.*instead ofu*) to improve performance - Consider using explicit field names for critical fields and wildcards for supplementary fields
- When using multiple wildcards, order them from most to least specific for better readability
Implementation Notes
- Wildcard patterns are converted to regex for matching
- Pattern matching supports prefix, suffix, and contains wildcards
- Field results are sorted alphabetically for consistent ordering
- Duplicate fields are automatically removed
4. Implementation Details
Technical Approach
The table command alias will be implemented at the parser level to generate identical AST structures:
- Add
tablecommand to the PPL parser grammar - Both
fieldsandtablecommands generate the sameProjectAST node - No separate table-specific AST node or logic needed
- Maintain all existing fields command functionality including wildcards
Implementation Strategy
The table command will be implemented as a parser-level alias:
- Parser recognizes both
fieldsandtablekeywords - Both commands generate identical
ProjectAST structures - Same
Analyzer.visitProject()method handles both commands - All existing features (inclusio, exclusion, etc.) work identically
Dependencies
- Existing fields command implementation
- PPL parser and grammar
- AST node structures
Testing Strategy
Unit Testing:
- Wildcard pattern matching logic (prefix, suffix, contains, complex patterns)
- Field deduplication algorithms
- Syntax parsing for space-delimited and mixed delimiters
- Cross-engine compatibility validation
Integration Testing:
- End-to-end functionality for both fields and table commands
- Pipeline integration with other PPL commands
- Performance benchmarks comparing wildcard vs. explicit field selection
- Edge case testing with special characters and unusual field names
Test Coverage Requirements:
- All five features tested for both commands
- Cross-engine compatibility (Calcite and non-Calcite)
- Regression testing to ensure existing functionality remains intact
- Performance validation to ensure no significant degradation
5. Expected Benefits
User Experience Improvements
- Simplified Query Writing: Wildcard patterns reduce the need for lengthy field lists
- Flexible Syntax Options: Users can choose between comma, space, or mixed delimiters
- Consistent Command Behavior: Fields and table commands work identically
- Reduced Query Complexity: Pattern matching eliminates repetitive field specifications
Performance Benefits
- Optimized Field Resolution: Efficient pattern matching algorithms
- Automatic Deduplication: Prevents unnecessary duplicate field processing
- Maintained Field Ordering: Predictable output structure
Developer Benefits
- Code Reusability: Shared implementation between fields and table commands
- Maintainability: Consistent codebase with unified field resolution logic
- Extensibility: Framework supports future enhancements and additional patterns
Backward Compatibility
- All existing queries will continue to work without modification
- New features are additive and do not break existing functionality
- Graceful handling of edge cases and invalid patterns
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languagecalcitecalcite migration releatedcalcite migration releatedenhancementNew feature or requestNew feature or requestv3.3.0
Type
Projects
Status
New
Status
Done