Skip to content

Conversation

yuancu
Copy link
Collaborator

@yuancu yuancu commented Sep 8, 2025

Description

Before this PR, pushed-down scripts does not work when there are arguments with user-defined types (UDTs). For example, the following simple script does not work if pushdown is enabled: source=bank | eval t = unix_timestamp(birthdate) | stats count() by t

Error response
{
  "error": {
    "reason": "Error occurred in OpenSearch engine: all shards failed",
    "details": "Shard[0]: org.opensearch.sql.exception.ExpressionEvaluationException: invalid to get doubleValue from value of type STRING\n\nFor more details, please send request for Json format to see the raw response from OpenSearch engine.",
    "type": "SearchPhaseExecutionException"
  },
  "status": 500
}

The problem arises from the serialization process. The serialization of UDTs is processed by Calcite's RelJson serializer. When serializing a UDT, it only keeps its SqlTypeName for future restoration. However, as many of UDTs are mapped to SqlTypeName.VARCHAR, all UDTs are restored as a VARCHAR type instead of the original UDT.

This PR fixes the issue by implementing custom logic when serializing & de-serializing UDTs.

Related Issues

Resolves #4063 , resolves #4322, and resolves #4340

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@yuancu yuancu added the bug Something isn't working label Sep 8, 2025
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu
Copy link
Collaborator Author

yuancu commented Sep 9, 2025

Another issue arises after supporting serializing & pushing down UDTs: testQ12 and testQ21 in CalcitePPLTpchIT fail with reason: NoSuchMethodException[org.apache.calcite.runtime.SqlFunctions.lt(java.lang.Object,java.lang.Object)]

Before this PR, the UDT are pushed down as strings, so it went for SqlFunctions.lt(java.lang.String,java.lang.String)` when resolving date comparison. Now, the they have became UDTs, it can not resolve UDT date comparison with proper implementation.

For example, Q12 query contains timestamp comparison:

source = orders
| join ON o_orderkey = l_orderkey lineitem
| where l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_shipmode in ('MAIL', 'SHIP')
    and l_receiptdate >= date('1994-01-01')
    and l_receiptdate < date_add(date('1994-01-01'), interval 1 year)
| stats sum(case(o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH', 1 else 0)) as high_line_count,
        sum(case(o_orderpriority != '1-URGENT' and o_orderpriority != '2-HIGH', 1 else 0)) as low_line_count
        by l_shipmode
| fields l_shipmode, high_line_count, low_line_count
| sort l_shipmode

Before this PR, the comparison script is like below:

rowTypet: {
  "fields": [
    {
      "type": "VARCHAR",
      "nullable": true,
      "precision": -1,
      "name": "l_receiptdate"
    },
    {
      "type": "VARCHAR",
      "nullable": true,
      "precision": -1,
      "name": "l_shipmode"
    },
    {
      "type": "VARCHAR",
      "nullable": true,
      "precision": -1,
      "name": "l_commitdate"
    },
...
  ],
  "nullable": false

  "op": {
    "name": "<",
    "kind": "LESS_THAN",
    "syntax": "BINARY"
  },
...
}

With this PR, the type becomes UDT:

rowTypet:{
  "fields": [
    {
      "udt": "EXPR_TIMESTAMP",
      "type": "VARCHAR",
      "nullable": true,
      "precision": -1,
      "name": "l_receiptdate"
    },
    {
      "type": "VARCHAR",
      "nullable": true,
      "precision": -1,
      "name": "l_shipmode"
    },
    {
      "udt": "EXPR_TIMESTAMP",
      "type": "VARCHAR",
      "nullable": true,
      "precision": -1,
      "name": "l_commitdate"
    },
    ...
    }
  ],
  "nullable": false
},
  "op": {
    "name": "<",
    "kind": "LESS_THAN",
    "syntax": "BINARY"
  },
...
  ]
}

The problem arises from where l_commitdate < l_receiptdate and l_shipdate < l_commitdate. It has problem resolving comparing UDT. In calcite without push-down enabled, timestamp UDT comparison is resolved to string comparison since the underlying type of EXPR_TIMESTAMP is a RexLiteral storing a string constant (see RexToLibTranslator.java#L1250). I don't know why it does not work when UDT is pushed down.

Working on fixing it.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Comment on lines 90 to 92
// UDTs are not comparable when pushed-down as scripts. We set their types to strings as a
// workaround. Refer to this comment for more details:
// https://github.com/opensearch-project/sql/pull/4245#issuecomment-3268673999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pushed-down generated code the same as that before pushdown? I notice we probably use the Calcite default BinaryImplementor to generate the comparison code. The primitive it compares depends on the reflected type. That's a possible root cause because we're not sure if the reflected UDT is the same as before.

Copy link
Collaborator Author

@yuancu yuancu Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data type differs, the rest, including the comparison code, remains the same.

For example, for this query: source = lineitem | where l_commitdate < l_receiptdate

Plan without this PR
{
  "calcite": {
    "logical": """LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(l_receiptdate=[$0], l_returnflag=[$1], l_tax=[$2], l_shipmode=[$3], l_suppkey=[$4], l_shipdate=[$5], l_commitdate=[$6], l_partkey=[$7], l_orderkey=[$8], l_quantity=[$9], l_comment=[$10], l_linestatus=[$11], l_extendedprice=[$12], l_linenumber=[$13], l_discount=[$14], l_shipinstruct=[$15])
    LogicalFilter(condition=[<($6, $0)])
      CalciteLogicalIndexScan(table=[[OpenSearch, lineitem]])
""",
    "physical": """CalciteEnumerableIndexScan(table=[[OpenSearch, lineitem]], PushDownContext=[[PROJECT->[l_receiptdate, l_returnflag, l_tax, l_shipmode, l_suppkey, l_shipdate, l_commitdate, l_partkey, l_orderkey, l_quantity, l_comment, l_linestatus, l_extendedprice, l_linenumber, l_discount, l_shipinstruct], SCRIPT-><($6, $0), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","query":{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXNyABFqYXZhLnV0aWwuQ29sbFNlcleOq7Y6G6gRAwABSQADdGFneHAAAAADdwQAAAAGdAAHcm93VHlwZXQGdnsKICAiZmllbGRzIjogWwogICAgewogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJsX3JlY2VpcHRkYXRlIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9yZXR1cm5mbGFnIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiRE9VQkxFIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAibF90YXgiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJsX3NoaXBtb2RlIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiQklHSU5UIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAibF9zdXBwa2V5IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9zaGlwZGF0ZSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogImxfY29tbWl0ZGF0ZSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfcGFydGtleSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfb3JkZXJrZXkiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJET1VCTEUiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAibmFtZSI6ICJsX3F1YW50aXR5IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9jb21tZW50IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9saW5lc3RhdHVzIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiRE9VQkxFIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAibF9leHRlbmRlZHByaWNlIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfbGluZW51bWJlciIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkRPVUJMRSIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfZGlzY291bnQiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJsX3NoaXBpbnN0cnVjdCIKICAgIH0KICBdLAogICJudWxsYWJsZSI6IGZhbHNlCn10AARleHBydADKewogICJvcCI6IHsKICAgICJuYW1lIjogIjwiLAogICAgImtpbmQiOiAiTEVTU19USEFOIiwKICAgICJzeW50YXgiOiAiQklOQVJZIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiaW5wdXQiOiA2LAogICAgICAibmFtZSI6ICIkNiIKICAgIH0sCiAgICB7CiAgICAgICJpbnB1dCI6IDAsCiAgICAgICJuYW1lIjogIiQwIgogICAgfQogIF0KfXQACmZpZWxkVHlwZXNzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAYdwgAAAAgAAAAEHQADWxfcmVjZWlwdGRhdGVzcgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRlVHlwZZ4tUq4QfcqvAgABTAAHZm9ybWF0c3QAEExqYXZhL3V0aWwvTGlzdDt4cgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRhVHlwZcJjvMoC+gU1AgADTAAMZXhwckNvcmVUeXBldAArTG9yZy9vcGVuc2VhcmNoL3NxbC9kYXRhL3R5cGUvRXhwckNvcmVUeXBlO0wAC21hcHBpbmdUeXBldABITG9yZy9vcGVuc2VhcmNoL3NxbC9vcGVuc2VhcmNoL2RhdGEvdHlwZS9PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGU7TAAKcHJvcGVydGllc3QAD0xqYXZhL3V0aWwvTWFwO3hwfnIAKW9yZy5vcGVuc2VhcmNoLnNxbC5kYXRhLnR5cGUuRXhwckNvcmVUeXBlAAAAAAAAAAASAAB4cgAOamF2YS5sYW5nLkVudW0AAAAAAAAAABIAAHhwdAAJVElNRVNUQU1QfnIARm9yZy5vcGVuc2VhcmNoLnNxbC5vcGVuc2VhcmNoLmRhdGEudHlwZS5PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGUAAAAAAAAAABIAAHhxAH4AEnQABERhdGVzcgA8c2hhZGVkLmNvbS5nb29nbGUuY29tbW9uLmNvbGxlY3QuSW1tdXRhYmxlTWFwJFNlcmlhbGl6ZWRGb3JtAAAAAAAAAAACAAJMAARrZXlzdAASTGphdmEvbGFuZy9PYmplY3Q7TAAGdmFsdWVzcQB+ABl4cHVyABNbTGphdmEubGFuZy5PYmplY3Q7kM5YnxBzKWwCAAB4cAAAAAB1cQB+ABsAAAAAc3EAfgAAAAAAAXcEAAAAAHh0AAxsX3JldHVybmZsYWd+cQB+ABF0AAZTVFJJTkd0AAVsX3RheH5xAH4AEXQABkRPVUJMRXQACmxfc2hpcG1vZGVxAH4AIHQACWxfc3VwcGtleX5xAH4AEXQABExPTkd0AApsX3NoaXBkYXRlc3EAfgAKcQB+ABNxAH4AFnEAfgAacQB+AB50AAxsX2NvbW1pdGRhdGVzcQB+AApxAH4AE3EAfgAWcQB+ABpxAH4AHnQACWxfcGFydGtleXEAfgAndAAKbF9vcmRlcmtleXEAfgAndAAKbF9xdWFudGl0eXEAfgAjdAAJbF9jb21tZW50c3IAOm9yZy5vcGVuc2VhcmNoLnNxbC5vcGVuc2VhcmNoLmRhdGEudHlwZS5PcGVuU2VhcmNoVGV4dFR5cGWtg6OTBOMxRAIAAUwABmZpZWxkc3EAfgAPeHEAfgAMfnEAfgARdAAHVU5LTk9XTn5xAH4AFXQABFRleHRxAH4AGnNxAH4AAAAAAAN3BAAAAAB4dAAMbF9saW5lc3RhdHVzcQB+ACB0AA9sX2V4dGVuZGVkcHJpY2VxAH4AI3QADGxfbGluZW51bWJlcn5xAH4AEXQAB0lOVEVHRVJ0AApsX2Rpc2NvdW50cQB+ACN0AA5sX3NoaXBpbnN0cnVjdHEAfgAgeHg=\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1758005112179638000}},"boost":1.0}},"_source":{"includes":["l_receiptdate","l_returnflag","l_tax","l_shipmode","l_suppkey","l_shipdate","l_commitdate","l_partkey","l_orderkey","l_quantity","l_comment","l_linestatus","l_extendedprice","l_linenumber","l_discount","l_shipinstruct"],"excludes":[]},"sort":[{"_doc":{"order":"asc"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
"""
  }
}
Plan with this PR
{
  "calcite": {
    "logical": """LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(l_receiptdate=[$0], l_returnflag=[$1], l_tax=[$2], l_shipmode=[$3], l_suppkey=[$4], l_shipdate=[$5], l_commitdate=[$6], l_partkey=[$7], l_orderkey=[$8], l_quantity=[$9], l_comment=[$10], l_linestatus=[$11], l_extendedprice=[$12], l_linenumber=[$13], l_discount=[$14], l_shipinstruct=[$15])
    LogicalFilter(condition=[<($6, $0)])
      CalciteLogicalIndexScan(table=[[OpenSearch, lineitem]])
""",
    "physical": """CalciteEnumerableIndexScan(table=[[OpenSearch, lineitem]], PushDownContext=[[PROJECT->[l_receiptdate, l_returnflag, l_tax, l_shipmode, l_suppkey, l_shipdate, l_commitdate, l_partkey, l_orderkey, l_quantity, l_comment, l_linestatus, l_extendedprice, l_linenumber, l_discount, l_shipinstruct], SCRIPT-><($6, $0), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":10000,"timeout":"1m","query":{"script":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXNyABFqYXZhLnV0aWwuQ29sbFNlcleOq7Y6G6gRAwABSQADdGFneHAAAAADdwQAAAAGdAAHcm93VHlwZXQG03sKICAiZmllbGRzIjogWwogICAgewogICAgICAidWR0IjogIkVYUFJfVElNRVNUQU1QIiwKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9yZWNlaXB0ZGF0ZSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogImxfcmV0dXJuZmxhZyIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkRPVUJMRSIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfdGF4IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9zaGlwbW9kZSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfc3VwcGtleSIKICAgIH0sCiAgICB7CiAgICAgICJ1ZHQiOiAiRVhQUl9USU1FU1RBTVAiLAogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJsX3NoaXBkYXRlIgogICAgfSwKICAgIHsKICAgICAgInVkdCI6ICJFWFBSX1RJTUVTVEFNUCIsCiAgICAgICJ0eXBlIjogIlZBUkNIQVIiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJuYW1lIjogImxfY29tbWl0ZGF0ZSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfcGFydGtleSIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfb3JkZXJrZXkiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJET1VCTEUiLAogICAgICAibnVsbGFibGUiOiB0cnVlLAogICAgICAibmFtZSI6ICJsX3F1YW50aXR5IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9jb21tZW50IgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJwcmVjaXNpb24iOiAtMSwKICAgICAgIm5hbWUiOiAibF9saW5lc3RhdHVzIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiRE9VQkxFIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgIm5hbWUiOiAibF9leHRlbmRlZHByaWNlIgogICAgfSwKICAgIHsKICAgICAgInR5cGUiOiAiSU5URUdFUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfbGluZW51bWJlciIKICAgIH0sCiAgICB7CiAgICAgICJ0eXBlIjogIkRPVUJMRSIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImxfZGlzY291bnQiCiAgICB9LAogICAgewogICAgICAidHlwZSI6ICJWQVJDSEFSIiwKICAgICAgIm51bGxhYmxlIjogdHJ1ZSwKICAgICAgInByZWNpc2lvbiI6IC0xLAogICAgICAibmFtZSI6ICJsX3NoaXBpbnN0cnVjdCIKICAgIH0KICBdLAogICJudWxsYWJsZSI6IGZhbHNlCn10AARleHBydADKewogICJvcCI6IHsKICAgICJuYW1lIjogIjwiLAogICAgImtpbmQiOiAiTEVTU19USEFOIiwKICAgICJzeW50YXgiOiAiQklOQVJZIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAiaW5wdXQiOiA2LAogICAgICAibmFtZSI6ICIkNiIKICAgIH0sCiAgICB7CiAgICAgICJpbnB1dCI6IDAsCiAgICAgICJuYW1lIjogIiQwIgogICAgfQogIF0KfXQACmZpZWxkVHlwZXNzcgARamF2YS51dGlsLkhhc2hNYXAFB9rBwxZg0QMAAkYACmxvYWRGYWN0b3JJAAl0aHJlc2hvbGR4cD9AAAAAAAAYdwgAAAAgAAAAEHQADWxfcmVjZWlwdGRhdGVzcgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRlVHlwZZ4tUq4QfcqvAgABTAAHZm9ybWF0c3QAEExqYXZhL3V0aWwvTGlzdDt4cgA6b3JnLm9wZW5zZWFyY2guc3FsLm9wZW5zZWFyY2guZGF0YS50eXBlLk9wZW5TZWFyY2hEYXRhVHlwZcJjvMoC+gU1AgADTAAMZXhwckNvcmVUeXBldAArTG9yZy9vcGVuc2VhcmNoL3NxbC9kYXRhL3R5cGUvRXhwckNvcmVUeXBlO0wAC21hcHBpbmdUeXBldABITG9yZy9vcGVuc2VhcmNoL3NxbC9vcGVuc2VhcmNoL2RhdGEvdHlwZS9PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGU7TAAKcHJvcGVydGllc3QAD0xqYXZhL3V0aWwvTWFwO3hwfnIAKW9yZy5vcGVuc2VhcmNoLnNxbC5kYXRhLnR5cGUuRXhwckNvcmVUeXBlAAAAAAAAAAASAAB4cgAOamF2YS5sYW5nLkVudW0AAAAAAAAAABIAAHhwdAAJVElNRVNUQU1QfnIARm9yZy5vcGVuc2VhcmNoLnNxbC5vcGVuc2VhcmNoLmRhdGEudHlwZS5PcGVuU2VhcmNoRGF0YVR5cGUkTWFwcGluZ1R5cGUAAAAAAAAAABIAAHhxAH4AEnQABERhdGVzcgA8c2hhZGVkLmNvbS5nb29nbGUuY29tbW9uLmNvbGxlY3QuSW1tdXRhYmxlTWFwJFNlcmlhbGl6ZWRGb3JtAAAAAAAAAAACAAJMAARrZXlzdAASTGphdmEvbGFuZy9PYmplY3Q7TAAGdmFsdWVzcQB+ABl4cHVyABNbTGphdmEubGFuZy5PYmplY3Q7kM5YnxBzKWwCAAB4cAAAAAB1cQB+ABsAAAAAc3EAfgAAAAAAAXcEAAAAAHh0AAxsX3JldHVybmZsYWd+cQB+ABF0AAZTVFJJTkd0AAVsX3RheH5xAH4AEXQABkRPVUJMRXQACmxfc2hpcG1vZGVxAH4AIHQACWxfc3VwcGtleX5xAH4AEXQABExPTkd0AApsX3NoaXBkYXRlc3EAfgAKcQB+ABNxAH4AFnEAfgAacQB+AB50AAxsX2NvbW1pdGRhdGVzcQB+AApxAH4AE3EAfgAWcQB+ABpxAH4AHnQACWxfcGFydGtleXEAfgAndAAKbF9vcmRlcmtleXEAfgAndAAKbF9xdWFudGl0eXEAfgAjdAAJbF9jb21tZW50c3IAOm9yZy5vcGVuc2VhcmNoLnNxbC5vcGVuc2VhcmNoLmRhdGEudHlwZS5PcGVuU2VhcmNoVGV4dFR5cGWtg6OTBOMxRAIAAUwABmZpZWxkc3EAfgAPeHEAfgAMfnEAfgARdAAHVU5LTk9XTn5xAH4AFXQABFRleHRxAH4AGnNxAH4AAAAAAAN3BAAAAAB4dAAMbF9saW5lc3RhdHVzcQB+ACB0AA9sX2V4dGVuZGVkcHJpY2VxAH4AI3QADGxfbGluZW51bWJlcn5xAH4AEXQAB0lOVEVHRVJ0AApsX2Rpc2NvdW50cQB+ACN0AA5sX3NoaXBpbnN0cnVjdHEAfgAgeHg=\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1758005410734926000}},"boost":1.0}},"_source":{"includes":["l_receiptdate","l_returnflag","l_tax","l_shipmode","l_suppkey","l_shipdate","l_commitdate","l_partkey","l_orderkey","l_quantity","l_comment","l_linestatus","l_extendedprice","l_linenumber","l_discount","l_shipinstruct"],"excludes":[]},"sort":[{"_doc":{"order":"asc"}}]}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
"""
  }
}

The above comment shows part of the decoded script, their main difference lies in operand types.

I guess part of the reason that the UDT comparison is not supported is that we are not using PPLFuncImpTable when resolving functions in the script?

# RelJsonSerializer.java#L54
private static final SqlOperatorTable pplSqlOperatorTable =
    SqlOperatorTables.chain(
        PPLBuiltinOperators.instance(),
        SqlStdOperatorTable.instance(),
        // Add a list of necessary SqlLibrary if needed
        SqlLibraryOperatorTableFactory.INSTANCE.getOperatorTable(
            SqlLibrary.MYSQL, SqlLibrary.BIG_QUERY, SqlLibrary.SPARK, SqlLibrary.POSTGRESQL));

Btw, we did not implement any special comparison operator for date time UDTs like that for IP. But it has been working well somehow with BinaryImplementor:

// RexImpTable.java#L3181
final Type type0 = argValueList.get(0).getType();
final Type type1 = argValueList.get(1).getType();
final SqlBinaryOperator op = (SqlBinaryOperator) call.getOperator();
final RelDataType relDataType0 = call.getOperands().get(0).getType();

type0 and type1 are resolved to VARCHAR for date time UDTs here.

Copy link
Contributor

@songkant-aws songkant-aws Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the code and checked the example ppl. I think I figured out the root cause. The obvious error happens when Calcite code generation cannot find the less than function signature by provided Linq4j Expressions.
See this error:

RuntimeException[while resolving method 'lt[class java.lang.Object, class java.lang.Object]' in class class org.apache.calcite.runtime.SqlFunctions]

Calcite lt(.., ...) method doesn't have such signature while resolving our UDT. See Calcite code: https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java#L2183-L2248

When we hardcode the UDT type as STRING, it apparently generates such code for runtime evaluation. We can see that it's STRING for sure.

public Object[] apply(Object root0) {
  final String input_value = ((org.apache.calcite.DataContext) root0).get("l_commitdate") == null ? null : ((org.apache.calcite.DataContext) root0).get("l_commitdate").toString();
  final String input_value0 = ((org.apache.calcite.DataContext) root0).get("l_receiptdate") == null ? null : ((org.apache.calcite.DataContext) root0).get("l_receiptdate").toString();
  return new Object[] {
      input_value == null || input_value0 == null ? null : Boolean.valueOf(org.apache.calcite.runtime.SqlFunctions.lt(input_value, input_value0))};
}

Then what caused this wrong method resolution if we set the field as ExprSqlType.Timestamp UDT? From the previous normal generated code, we can see it assigns the input_value and input_value0 to the type String. In our case, it has to be assigned to Object. It means we resolved the wrong Java class for UDT. This reminds me that during compilation of script code, we might used incorrect type factory for input because we just simply copied Calcite code.

This line causes the issue: https://github.com/opensearch-project/sql/blob/main/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/CalciteScriptEngine.java#L131-L132. It uses default JavaTypeFactory instead of our own OpenSearchTypeFactory that tunes UDT logic. Our OpenSearchTypeFactory overrides the getJavaClass method to resolve UDT's Java class: https://github.com/opensearch-project/sql/blob/main/core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java#L323-L329

So I think a simple fix is to remove the problematic line and pass OpenSearchTypeFactory.TYPE_FACTORY to our
ScriptInputGetter. And we need to make sure no impact of other cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for help debugging!

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…lcite's type factory with OpenSearchTypeFactory

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
}

@Test
void testSerializeUDT() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Minor and Non-blocking] It's better to add some cases of serializing and deserializing nested structure like List[String, UDT_TIMESTAMP, UDT_IP] or Map{(key1, Integer), (key2, UDT_TIMESTAMP) }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! Tests added

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
loadExpectedPlan("explain_agg_script_timestamp_push.json"),
explainQueryToString(
String.format(
"source=%s | eval t = unix_timestamp(birthdate) | stats count() by t | sort t |"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the IT:

source=%s | eval t = date_add(birthdate, interval 1 day) | stats count() by span(t, 1d)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

}

@Test
public void testStatsCountOnFunctionsWithUDTArg() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@penghuo penghuo merged commit 69a718b into opensearch-project:main Sep 23, 2025
33 checks passed
@yuancu yuancu deleted the issues/4063 branch September 24, 2025 01:58
joshuali925 pushed a commit that referenced this pull request Sep 24, 2025
* Doc enhancement for eventstats and bin command (#4117)

* distinct_count doc for eventstats

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* doc enhancement

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add fields for consistency between different Java versions

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove changes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add bin to index.rst

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add link

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Implement `Append` command with Calcite (#4123)

* Implement Append Command

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix spotless check

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Rephrase append.rst

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Support subsearch different index for append command

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix some tests and add cross cluster IT

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Not support empty subsearch input for now

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix doctest

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Support empty source edge case

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix anonymizer tests

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add missing test cases for nested join or lookup command in appended subsearch

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix compile issue

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>

* `Bin` command big5 queries (#4163)

* Bin command big5 queries

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update IT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove tests

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>

* Don't recreate indices on every test (#4222)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Enable pushdown optimization for filtered aggregation (#4213)

* Enable filtered aggregation pushdown

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add basic UT and ignore IT for now

Signed-off-by: Chen Dai <daichen@amazon.com>

* Enable aggregate case to filter rule and fix UT and IT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add expected json file for no pushdown test

Signed-off-by: Chen Dai <daichen@amazon.com>

* Remove unnecessary aggregate case to filter rule

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UT for IS_TRUE support

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add more explain IT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Refactor UT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Extract aggregate filter analyzer abstraction

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add more UT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Refactor UT with fluent API

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UT for distinct count

Signed-off-by: Chen Dai <daichen@amazon.com>

* Address comment by adding UT for script filter pushdown

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix spotless

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* Split up our test actions into unit, integ, and doctest. (#4193)

* Run unit test suites in parallel

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

* Split out our test actions

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Make unit test step run in parallel

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix removed bwc tests

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add another missing parallel flag

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* [Feature] Core Implementation of `rex` Command In PPL (#4109)

* rex - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* stop using utils

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix spotless check

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* offset_field - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* max_match - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* sed - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix name capture group for extraction

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add rex rst doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* IT - initial setup

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add a analyzer test for legacy engine

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add UT for rex

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* sed - add pushdown for sed and explain IT and IT with fix

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* anonymizer - add rex for anonymizer and test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add cross cluster IT for rex

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - resolve comments for rst doc 0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - address some comments 1

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - resolve comment in rst doc to add a java doc link

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* kai - modify the bin ast builder test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - fix the extraction behavior without filter even when there is zero match

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix rex explain no pushdown

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* change the offset val output format

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix rst file

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - SWITCH TO USE CALCITE NATIVE OPERATORS

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Peng - fix tests after operator change

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* support mode=extract and update doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix the issue after rebase

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - enforce specifying field in antlr for now

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* relocate rex cmd IT

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - simplify vistFunciton

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - add UT for RexExtractMultiFunction

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - add UT RexOffsetFunction

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix some tests

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* DECOUPLE SED + OFFSET FIELD

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Improve error handling for extract

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add this rex rst into index

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix return type in extract multi

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add rex doc into doc test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix doc test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Fix linting

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix rebase issue

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix regex anonymizer tests

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix analyzer test and setup to use util function

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* lint fix

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix doc test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add max match limit implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix anonymizer test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - simplify if

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - make extract multi to only handle the case of max_match > 1

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add wildcard support for rename command (#4019)

* add wildcard support for rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix calcite wildcard support and add tests

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add check to analyzer

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* remove v2 engine wildcard support

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* support cascading rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add cross cluster test

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add test for cascading rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add test for cascading rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* change behavior for renaming existing fields

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add tests and update docs

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update docs

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update docs

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix renaming to same name

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix behavior for consecutive wildcards/address comments

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add back import

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* Add support for `median(<value>)` (#4234)

* First revision

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Fixing documentation

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Removing unnecessary comments

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Fixinf stats.rst documentation

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Fixing documentation

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Addressing comments

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

---------

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Co-authored-by: Aaron Alvarez <aaarone@amazon.com>

* Dynamic source selector (#4116)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* Add gitignore (#4258)

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Support join field list and join options (#3803)

* Support join field list and join options

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add SPL-compatible syntax setting

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Revert SPL settings

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support max=n option

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* support max=n in sql-like join syntax

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add Explain IT for new join syntax

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Refactor the user doc

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Disable the collapse pushdown

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* refactor

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support first/last aggregate functions for PPL (#4223)

* Support first/last aggregation functions for PPL

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Support null

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove legacy

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update doc

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix doctest

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix stats.rst file

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* move pushdown logic to AggregateAnalyzer

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix IT and update null handling

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add test cases for null handling

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* handle parallelism

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Simplify CalciteExplainIT and add UT for AggregateAnalyzer

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix gitignore to ignore symbolic link (#4263)

add comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Push down limit operator into aggregation bucket size (#4228)

* Push down limit operator into aggregation bucket size

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix robust issue in OpenSearchLimitIndexScanRule

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix the IT issue caused by merging conflict (#4270)

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Print links to test logs after integTest (#4273)

* Print links to test logs after integTest

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* print even when tets failed

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* [Feature] Implementation of mode `sed` and `offset_field` in rex PPL command (#4241)

* [Feature] Implementation of mode sed and offset_field in rex PPL command

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* update rex rst doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - address comment and merge grammar in parser

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - limit offset field only in extraction mode

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - specify exception type of o_f UDF

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - add exception type of o_f UDF - 2

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - add exception type of o_f UDF - also fix the test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - alphabetical order of o_f return

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add earliest/latest aggregate function for eventstats PPL command (#4212)

* Add earliest/latest aggregate function for eventstats command

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* update docs

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor refactoring

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix doctest

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Simplify logics

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Revert visitWindowFunction

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add sort to some examples

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Refactor tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix argument validation error (WIP)

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add argument validation for window functions

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix validation

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix tests and refactor

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix test

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix merge issue

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Speed up aggregation pushdown for single group-by expression (#3550)

* Speed up aggregation pushdown for single group-by expression

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add configs nullable_bucket

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts error

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix unit tests

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix order

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix UT in windows

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix compile error of conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add more ITs after merging push down limit to agg buckets

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* address comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Clear sorts in source builder for aggregation pushdown

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Delete the TODO of v2, it's resolved now

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix doctest

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Introduce YAML formatter for better testing/debugging (#4274)

* Implement YamlFormatter

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Enable YAML based plan comparison in tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix line break issue in Windows

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix in test case

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix line break issue

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* doctest: Use 1.0 branch instead of main (#4219)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix doctest (#4292)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Search Command Revamp (#4152)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* `mvjoin` support in PPL Caclite (#4217)

* mvjoin support in PPL Caclite

Signed-off-by: ps48 <pshenoy36@gmail.com>

* fix texts

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update docs

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update doc examples

Signed-off-by: ps48 <pshenoy36@gmail.com>

* rebase main, update test

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update test with real array fields

Signed-off-by: ps48 <pshenoy36@gmail.com>

* use verifyQueryThrowsException in CalcitePPLFunctionTypeTest

Signed-off-by: ps48 <pshenoy36@gmail.com>

* spotless check fix

Signed-off-by: ps48 <pshenoy36@gmail.com>

* remove string,string registration for mvjoin

Signed-off-by: ps48 <pshenoy36@gmail.com>

* remove string,string test

Signed-off-by: ps48 <pshenoy36@gmail.com>

---------

Signed-off-by: ps48 <pshenoy36@gmail.com>

* strftime function implementation (#4106)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* Add non-numeric field support for max/min functions (#4281)

* add non-numeric support for max/min

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix mixed field behavior

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add tests

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* empty

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* support ip type max/min

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* use tophitsparser

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* remove v2 explain

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* check for numeric fields for native max/min

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* change names

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix type checking

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* Add  `values` stats function with UDAF (#4276)

* Add  stats function

Signed-off-by: ps48 <pshenoy36@gmail.com>

* add settings for max values

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update functiontypetest IT

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update documentation for values settings

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update the rst docs, remove settingsholder

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update AST additions

Signed-off-by: ps48 <pshenoy36@gmail.com>

* updated the IT testValuesFunctionGroupBy

Signed-off-by: ps48 <pshenoy36@gmail.com>

---------

Signed-off-by: ps48 <pshenoy36@gmail.com>

* Support ISO8601-formatted string in PPL (#4246)

* Support parsing ISO 8601 datetime format for timestamp value

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Modify tests for ISO 8601 timestamp input

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add support of iso 8601 date string to date and time

- add an IT for date time comparison with iso 8601 formatted literal

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Push down project operator with non-identity projections into scan (#4279)

* Support project push down after aggregation

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Push down project operator with non-identity projections into scan

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Also changing plan from merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix 4296

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add spotless precommit hook + license check (#4306)

* Add spotless precommit hook

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Decouple plugin spotless versions + upgrade spotless

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Enable license headers everywhere

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Remove a redundant comment

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix removed additional licenses

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add Ryan as a maintainer (#4257)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Spotless precommit: apply instead of check (#4320)

* Add merge_group trigger to test workflows (#4216)

* Update grammar files and developer guide (#4301)

* Update grammar files and developer guide

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix geopoiint issue in complex data types (#4325)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* [Doc] Correct the comparision table for rex doc (#4321)

* [Doc] Correct the comparision table for rex doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - remove non support feature from comparison table

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add splunk to ppl cheat sheet (#3726)

* update with latest ppl commands and function improvement

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Address comments

Signed-off-by: Peng Huo <penghuo@gmail.com>

---------

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Date/Time based Span aggregation should always not present null bucket (#4327)

* Updating coalesce documentation (#4305)

Co-authored-by: Aaron Alvarez <aaarone@amazon.com>

* Support serializing & deserializing UDTs when pushing down scripts (#4245)

* Support serializing & deserializing UDTs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update explain ITs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Push down UDT types as string types for comparison operators

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Separate test cases and add an ignored IT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Correct the handling of UDT in CalciteScriptEngine by substituting calcite's type factory with OpenSearchTypeFactory

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix deserialization for IP

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove testExplainPushDownScriptsContainingUDT in v2

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Enable testLimitAfterAggregation in CalcitePPLAggregationIT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Unit test serialize map and array types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix deeper level deserialization of UDTs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a yaml test for issue 4322

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a test case for issue 4340

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove redundant classes

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* change Anonymizer to mask PPL (#4352)

* change Anonymizer

Signed-off-by: xinyual <xinyual@amazon.com>

* fix case

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>

* [Feature][Enhancement] Enhance patterns command with additional sample_logs output field (#4155)

* Enhance patterns command with additional sample_logs output field

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Reorder agg fields for simple_pattern

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Test fix after previous fix to not drop group by list

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Optimize count aggregation performance by utilizing native doc_count in v3 (#4337)

* Optimize bucket aggregation performance by utilizing native doc_count in v3

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix issue of count(FIELD)

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert the doc_count pushdown for count(FIELD) by EXPR

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support pushdown count aggregation in no bucket aggregation to hits.total.value

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* No index found with given index pattern should throw IndexNotFoundException (#4369)

* No index found with given index pattern should throw IndexNotFoundException

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Push down stats with bins on time field into auto_date_histogram (#4329)

* Push down stats with bins on time field into auto_date_histogram

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Prevent pushing down multiple group-by with bins in advance.

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Remove useless code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT after merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: ps48 <pshenoy36@gmail.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Co-authored-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>
Co-authored-by: Songkan Tang <songkant@amazon.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Chen Dai <daichen@amazon.com>
Co-authored-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Co-authored-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Co-authored-by: Aaron Alvarez <aaarone@amazon.com>
Co-authored-by: Vamsi Manohar <reddyvam@amazon.com>
Co-authored-by: Tomoyuki MORITA <moritato@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Co-authored-by: qianheng <qianheng@amazon.com>
Co-authored-by: Shenoy Pratik <sgguruda@amazon.com>
Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
Co-authored-by: Peng Huo <penghuo@gmail.com>
Co-authored-by: Xinyuan Lu <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

5 participants