Skip to content

[FEATURE]New expand_field PPL Command #3016

@YANG-DB

Description

@YANG-DB

Is your feature request related to a problem?
Adding a PPL new expand_field command which adds array and nested object expansion functionality to PPL

Is your feature request related to a problem? Please describe.
OpenSearch's Piped Processing Language (PPL) currently lacks an efficient way to expand arrays and nested objects into separate events, similar to SQL's UNNEST or JSON expansion functions. This limitation hinders the analysis of complex data structures, particularly when working with JSON logs or documents containing arrays or nested objects.

Describe the solution you'd like
We propose adding a new command to PPL that would allow users to expand arrays and nested objects into separate events, similar to SQL's UNNEST function, but with additional flexibility.

The functionality should:

  1. Expand array fields or nested objects into separate events (similar to SQL's UNNEST)
  2. Retain all other fields from the original event in each new event (addressing a limitation of SQL UNNEST)
  3. Support nested fields and complex JSON structures (going beyond basic SQL capabilities)
  4. Allow for subsequent processing of each expanded value in the PPL pipeline
  5. Work seamlessly with unstructured or semi-structured data (unlike SQL, which typically requires predefined schemas)

SQL-like example and comparison:
Consider this SQL-like syntax:

SELECT * FROM my_index
CROSS JOIN UNNEST(items) AS expanded_item
WHERE expanded_item.status = 'active'

The proposed OpenSearch PPL equivalent might look like:

source =  my_index 
| parse my_nested_field `?<items>[*]` as items
| expand_field items
| where items.status = "active"

Key differences and advantages:

  1. No need for explicit JOIN syntax, making it more intuitive for log analysis
  2. Automatic handling of nested structures without need for complex JSON parsing functions
  3. Ability to work with dynamic schemas and unstructured data

Describe alternatives you've considered
Current alternatives include:

  1. Using complex JSON path queries, which can be cumbersome
  2. Processing the data outside of OpenSearch, reducing real-time analysis capabilities

Additional context
This feature would bridge the gap between SQL's structured data handling and the need for flexible, real-time analysis of semi-structured log data. It combines the power of SQL's UNNEST with the flexibility required for log and event processing.

Potential Impact

  • Simplified queries for complex data structures in logs and events
  • Enhanced real-time analytics capabilities for nested JSON data
  • Improved performance compared to client-side processing of nested structures
  • Better alignment with SQL-like functionality while maintaining PPL's simplicity

Proposed Implementation
The new command (e.g., expand_field) could be implemented as a new command in the PPL engine, combining the concepts of SQL's UNNEST with the flexibility needed for unstructured log data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languageenhancementNew feature or request

    Type

    No type

    Projects

    Status

    Done

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions