-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Problem Statement
Currently, OpenSearch PPL commands lack support for a default field concept, which creates limitations when implementing text processing and analysis commands. Many PPL commands being developed require the ability to operate on a default field when no explicit field is specified, but this functionality is not available in the current implementation.
This limitation affects multiple commands and prevents PPL from achieving a more intuitive user experience where operating on primary event data without specifying a field is a common pattern in log analysis workflows.
Current State
Affected Commands
- rex/regex commands: Requires extract patterns from raw events without explicit field specification
- Other text processing commands in development
Current Behavior
# This doesn't work - no default field to operate on
source=logs | rex "(?<error>ERROR: .*)"
# Must explicitly specify field every time
source=logs | rex field=message "(?<error>ERROR: .*)"
User Impact
- Verbose query syntax requiring repeated field specifications
- Unable to process raw log data directly
Proposed Solution
1. Introduce Default Field Concept
Implement a configurable default field (e.g., _source
or message
) that commands can use when no field is explicitly specified.
2. Command Support
Update commands to check for and use the default field when no field parameter is provided:
# When no field specified, use default field
source=logs | rex "(?<level>ERROR|WARN|INFO)"
# Explicit field still works
source=logs | rex field=custom_field "(?<level>ERROR|WARN|INFO)"
3. Configuration Options
- Cluster-level / Index-level configuration for default field name
- Fallback chain (e.g., try
_raw
, thenmessage
, then first text field)
Use Cases
Log Analysis
# Extract log level from raw events
source=application_logs | rex "(?<level>\\w+):\\s+(?<msg>.*)"
# Parse structured logs without field specification
source=apache_logs | rex "(?<ip>\\d+\\.\\d+\\.\\d+\\.\\d+).*\\[(?<timestamp>[^\\]]+)\\]"
Security Analysis
# Extract security events from raw logs
source=security_logs | regex "failed.*authentication" | rex "user\\s+(?<user>\\w+)"
Technical Considerations
1. Field Resolution Strategy
- Check if field parameter is provided
- If not, look for configured default field
- If default field doesn't exist, return appropriate error
2. Backward Compatibility
- Existing queries with explicit fields must continue working
- Default behavior should not break current implementations
Benefits
- Improved Usability: Simpler, more intuitive query syntax
- Reduced Verbosity: Cleaner queries for common use cases
- Consistency: Uniform behavior across text processing commands
Risks and Mitigation
Risk 1: Ambiguous Field Resolution
Mitigation: Clear precedence rules and error messages
Risk 2: Breaking Changes
Mitigation: Optional feature with explicit opt-in
Risk 3: Performance Overhead
Mitigation: Compile-time resolution, no runtime cost
Success Criteria
- Default field configuration available in PPL settings
- Rex/regex commands work without explicit field parameter
- Parse command supports default field
- No performance regression in existing queries
- Documentation updated with examples
- Migration guide available
Related Issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status