generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 176
Closed
Labels
PPLPiped processing languagePiped processing languageRFCRequest For CommentsRequest For Commentscalcitecalcite migration releatedcalcite migration releatedv3.4.0
Description
Problem Statement
Users need a first-class way to execute multiple independent subsearches and view the combined results in strict time order. Achieving this today requires manual composition (e.g., append + explicit sort) and careful command selection, which is error-prone, harder to optimize, and easy to
misuse (e.g., mixing non-streaming operations that stall pipelines).
Current State
- PPL provides
append, which concatenates results sequentially in the order subsearches are written. appendcommand can only support one single subsearch- There is no built-in interleaving by timestamp; users must add a
sortas a separate step.
Short-Term Goal
Current multisearch behavior:
- Uses UNION ALL + ORDER BY @timestamp DESC approach for result combination
- Supports all PPL commands in subsearches (no artificial streaming restrictions)
- Provides timestamp-based interleaving when @timestamp field is available
- Falls back to sequential concatenation when timestamp field is missing
Long-Term Goals
- Provide a first-class generating command that:
- Runs two or more subsearches and globally interleaves results by timestamp (DESC).
- Implements concurrent subsearch execution similar to SPL's streaming architecture for improved performance.
- Provides real-time result merging capabilities for live data scenarios.
Current Implementation
The multisearch command is now available with the following syntax:
source=<index> | multisearch [ <subsearch> ] [ <subsearch> ] ... [ <subsearch> ]
Key semantics:
- Position: Can be used after a source command (not necessarily first in pipeline).
- Cardinality: Requires ≥ 2 subsearches.
- Command support: All PPL commands are now supported in subsearches.
- Result combination: UNION ALL of all subsearch outputs, followed by ORDER BY @timestamp DESC.
- Time field: Uses @timestamp by default for ordering results.
Examples
-- Age group analysis with stats (now supported)
source=accounts | multisearch
[source=accounts | where age < 30 | eval age_group = 'young']
[source=accounts | where age >= 30 | eval age_group = 'adult']
| stats count by age_group
-- Time interleaving across two indices
source=logs | multisearch
[source=service_logs | where category IN ('A', 'B')]
[source=metrics_logs | where category IN ('E', 'F')]
| head 5
-- Complex aggregations in subsearches (now supported)
source=data | multisearch
[source=sales | stats sum(revenue) by region | eval type = 'sales']
[source=costs | stats sum(expense) by region | eval type = 'costs']
| sort region
Next Steps for Long-Term Implementation
- Concurrent Execution Architecture: Investigate implementing SPL-style concurrent subsearch execution using Calcite's parallel query capabilities.
- Real-time Merging: Explore streaming result merging for live data scenarios where results arrive continuously.
- Performance Optimization: Benchmark current UNION ALL + SORT approach vs potential concurrent execution benefits.
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languageRFCRequest For CommentsRequest For Commentscalcitecalcite migration releatedcalcite migration releatedv3.4.0
Type
Projects
Status
New
Status
Done