Skip to content

[FEATURE] Support Default Field in PPL Commands #4111

@RyanL1997

Description

@RyanL1997

Problem Statement

Currently, OpenSearch PPL commands lack support for a default field concept, which creates limitations when implementing text processing and analysis commands. Many PPL commands being developed require the ability to operate on a default field when no explicit field is specified, but this functionality is not available in the current implementation.

This limitation affects multiple commands and prevents PPL from achieving a more intuitive user experience where operating on primary event data without specifying a field is a common pattern in log analysis workflows.

Current State

Affected Commands

  • rex/regex commands: Requires extract patterns from raw events without explicit field specification
  • Other text processing commands in development

Current Behavior

# This doesn't work - no default field to operate on
source=logs | rex "(?<error>ERROR: .*)"

# Must explicitly specify field every time
source=logs | rex field=message "(?<error>ERROR: .*)"

User Impact

  • Verbose query syntax requiring repeated field specifications
  • Unable to process raw log data directly

Proposed Solution

1. Introduce Default Field Concept

Implement a configurable default field (e.g., _source or message) that commands can use when no field is explicitly specified.

2. Command Support

Update commands to check for and use the default field when no field parameter is provided:

# When no field specified, use default field
source=logs | rex "(?<level>ERROR|WARN|INFO)"

# Explicit field still works
source=logs | rex field=custom_field "(?<level>ERROR|WARN|INFO)"

3. Configuration Options

  • Cluster-level / Index-level configuration for default field name
  • Fallback chain (e.g., try _raw, then message, then first text field)

Use Cases

Log Analysis

# Extract log level from raw events
source=application_logs | rex "(?<level>\\w+):\\s+(?<msg>.*)"

# Parse structured logs without field specification  
source=apache_logs | rex "(?<ip>\\d+\\.\\d+\\.\\d+\\.\\d+).*\\[(?<timestamp>[^\\]]+)\\]"

Security Analysis

# Extract security events from raw logs
source=security_logs | regex "failed.*authentication" | rex "user\\s+(?<user>\\w+)"

Technical Considerations

1. Field Resolution Strategy

  • Check if field parameter is provided
  • If not, look for configured default field
  • If default field doesn't exist, return appropriate error

2. Backward Compatibility

  • Existing queries with explicit fields must continue working
  • Default behavior should not break current implementations

Benefits

  1. Improved Usability: Simpler, more intuitive query syntax
  2. Reduced Verbosity: Cleaner queries for common use cases
  3. Consistency: Uniform behavior across text processing commands

Risks and Mitigation

Risk 1: Ambiguous Field Resolution

Mitigation: Clear precedence rules and error messages

Risk 2: Breaking Changes

Mitigation: Optional feature with explicit opt-in

Risk 3: Performance Overhead

Mitigation: Compile-time resolution, no runtime cost

Success Criteria

  • Default field configuration available in PPL settings
  • Rex/regex commands work without explicit field parameter
  • Parse command supports default field
  • No performance regression in existing queries
  • Documentation updated with examples
  • Migration guide available

Related Issues

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagecalcitecalcite migration releatedfeature

Projects

Status

Not Started

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions