Skip to content

[RFC] Support permissive mode in PPL #4349

@penghuo

Description

@penghuo

Problem Statement

OpenSearch PPL is strong-schema and fail-fast. When data is dirty or drifting, queries abort instead of degrading gracefully. At scale, this makes analytics and operations brittle.

  • Type/data mismatch. PPL evaluates expressions on strictly typed fields derived from mappings. Mixed representations—e.g., "1" (string), [1] (array) trigger cast/class errors that terminate the query and break dashboards. Operators lack TRY/SAFE semantics, so evaluation cannot proceed null-on-error. Example: source=log* | eval xx=flags+1 fails when any shard holds "1" or [1] in flags; desired behavior is 2 or NULL with an error counter.
  • Schema-missing (dynamic:false) fields are invisible to PPL. PPL relies on mappings for field addressability and typing. When objects are mapped with dynamic:false, their subfields are present in _source but unmapped, so PPL cannot reference them, even though the Query DSL can project them from _source. This creates a usability gap, forces reindex/mapping changes, or pushes users back to DSL. It also pressures teams toward mapping explosion for discoverability. Example: object event with dynamic:false; fields event.user returns 400 in PPL, while DSL with "_source": ["event.user"] succeeds.
  • Explict cast required when access extract field, When using the REX/PARSE/SPATH command to extract numeric values and then performing arithmetic operations on them, the system fails because the extracted values are treated as strings rather than numbers.
  • Cross-index type conflicts. In observability datasets, schemas drift: the same logical field may be string in some indices and long in others (often from JSON/text extraction). When a PPL query spans those indices, the engine effectively picks one variant to type the field, which is non-deterministic and error-prone.

Requirements

  • Coverage: Handle type/data mismatches, dynamic:false invisibility, and cross-index type conflicts under permissive TRY/SAFE semantics.
  • Performance: No performance regression.
  • Reusability: Engine-agnostic design reusable for PPL on other engines, including Spark.

Expected Behaviour

Setting: ppl.query.permissive=true, default is false, true enable permissive mode.
Scenario: intA * stringB where intA is mapped as integer and stringB is unmapped.
Analyze stage, The planner resolves intA as type INTEGER. Since stringB is unmapped, the planner treats it as ANY (unknown at plan time).
Function resolution stage, The engine selects the multiplication function with signature (INTEGER, INTEGER). It automatically inserts safe casts so that both operands conform to the required types. The final expression is *( safe_cast(intA AS INTEGER), safe_cast(stringB AS INTEGER) ).
Data scan and load, Rows are read from OpenSearch as Object[]. No premature casting occurs during scanning.
Data processing (compiled expression), For each row, the engine attempts to convert intA and stringB to integers. If conversion fails for either operand, the value becomes NULL and an error counter is incremented for telemetry. If both casts succeed, the product is computed. If either cast fails, the result is NULL.


public Object current() {
  final Object[] current = (Object[]) inputEnumerator.current();
  final Integer v1 = current[0] == null ? null : (Integer) new org.apache.calcite.linq4j.function.Function0() {
    public Object apply() {
      try {
        return org.apache.calcite.runtime.SqlFunctions.toInt(current[0]);
      } catch (Exception e) {
        return null;
      }
    }
  }.apply()

  final Integer v2 = current[1] == null ? null : (Integer) new org.apache.calcite.linq4j.function.Function0() {
    public Object apply() {
      try {
        return org.apache.calcite.runtime.SqlFunctions.toInt(current[1]);
      } catch (Exception e) {
        return null;
      }
    }
  }.apply();
  return new Object[] {Integer.valueOf(v1 * v2)};
}

Results, PPL correctly process data and tolerate type/data mismatch. e.g.

intA stringB intA * stringB comments
2 "3" 6
"2" "3" 6 "2" can convert to int, return 6
2 "3a" null 3a can not be cast to int, return null
[2, 3, 4] "3" null array [2,3,4] can not be cast to int, return null

Decision Points

1.Type to use for missing or unknown fields
Two models are viable.

  • String (Java String) may require extra parsing when numeric or boolean semantics are needed.
  • Any (Java Object) minimizes unnecessary conversions but demands strict runtime checks. (Preferred)

2.Function resolution for unknown fields
Every function has one or more signatures. During resolution, the engine maps input field types, including unknowns, to compatible target types and inserts casts as needed. To handle type and data mismatches, choose a consistent approach:

  • Normalize through strings by applying safe_cast(field AS STRING) followed by safe_cast(STRING AS expected_type) to harmonize arrays, numerics, and booleans.
  • Keep values as Object , always surround inputs with safe_cast(field AS expected_type) so that failures yield NULL instead of exceptions. (Preferred)

3.Data loaded from OpenSearch
There are two loading strategies.

  • Convert all values to strings when constructing rows. This simplifies downstream logic but can increase CPU cost for numeric workloads.
  • Preserve native Java types in rows and cast on demand. (Current approach)

4.Output schema publication
When mappings are reliable, publish dataset-driven types. When mappings are unavailable or inconsistent, derive the output schema from operator requirements and the inserted casts. Fields affected by TRY semantics should be nullable, and the response should include drift and casting counters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PPLPiped processing languageRFCRequest For CommentsenhancementNew feature or request

    Type

    No type

    Projects

    Status

    New

    Status

    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions