Skip to content

[RFC] Support multisearch command for PPL #4348

@ahkcs

Description

@ahkcs

Problem Statement

Users need a first-class way to execute multiple independent subsearches and view the combined results in strict time order. Achieving this today requires manual composition (e.g., append + explicit sort) and careful command selection, which is error-prone, harder to optimize, and easy to
misuse (e.g., mixing non-streaming operations that stall pipelines).

Current State

  • PPL provides append, which concatenates results sequentially in the order subsearches are written.
  • append command can only support one single subsearch
  • There is no built-in interleaving by timestamp; users must add a sort as a separate step.

Short-Term Goal

Current multisearch behavior:

  • Uses UNION ALL + ORDER BY @timestamp DESC approach for result combination
  • Supports all PPL commands in subsearches (no artificial streaming restrictions)
  • Provides timestamp-based interleaving when @timestamp field is available
  • Falls back to sequential concatenation when timestamp field is missing

Long-Term Goals

  • Provide a first-class generating command that:
    • Runs two or more subsearches and globally interleaves results by timestamp (DESC).
    • Implements concurrent subsearch execution similar to SPL's streaming architecture for improved performance.
    • Provides real-time result merging capabilities for live data scenarios.

Current Implementation

The multisearch command is now available with the following syntax:

source=<index> | multisearch [ <subsearch> ] [ <subsearch> ] ... [ <subsearch> ]

Key semantics:

  • Position: Can be used after a source command (not necessarily first in pipeline).
  • Cardinality: Requires ≥ 2 subsearches.
  • Command support: All PPL commands are now supported in subsearches.
  • Result combination: UNION ALL of all subsearch outputs, followed by ORDER BY @timestamp DESC.
  • Time field: Uses @timestamp by default for ordering results.

Examples

  -- Age group analysis with stats (now supported)
  source=accounts | multisearch
      [source=accounts | where age < 30 | eval age_group = 'young']
      [source=accounts | where age >= 30 | eval age_group = 'adult']
  | stats count by age_group
  -- Time interleaving across two indices
  source=logs | multisearch
      [source=service_logs | where category IN ('A', 'B')]
      [source=metrics_logs | where category IN ('E', 'F')]
  | head 5
  -- Complex aggregations in subsearches (now supported)
  source=data | multisearch
      [source=sales | stats sum(revenue) by region | eval type = 'sales']
      [source=costs | stats sum(expense) by region | eval type = 'costs']
  | sort region

Next Steps for Long-Term Implementation

  • Concurrent Execution Architecture: Investigate implementing SPL-style concurrent subsearch execution using Calcite's parallel query capabilities.
  • Real-time Merging: Explore streaming result merging for live data scenarios where results arrive continuously.
  • Performance Optimization: Benchmark current UNION ALL + SORT approach vs potential concurrent execution benefits.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageRFCRequest For Commentscalcitecalcite migration releatedv3.4.0

Type

No type

Projects

Status

New

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions