generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 181
Support serializing external OpenSearch UDFs at pushdown time #4618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…r types are UNDEFINED Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
qianheng-aws
approved these changes
Nov 3, 2025
LantaoJin
approved these changes
Nov 3, 2025
Contributor
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4618-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a003e8c7399b62ebc3a57c07bb98f11d843ceb82
# Push it to GitHub
git push --set-upstream origin backport/backport-4618-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
yuancu
added a commit
to yuancu/sql-plugin
that referenced
this pull request
Nov 4, 2025
…arch-project#4618) * Supports serilizing external OpenSearch UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Correct subfield access logical when calling ITEM Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Resolve types of generated structs based on their values because their types are UNDEFINED Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add explain and integration tests for geoip Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit a003e8c)
Merged
8 tasks
yuancu
added a commit
that referenced
this pull request
Nov 4, 2025
…ushdown time (#4618) (#4725) * Support serializing external OpenSearch UDFs at pushdown time (#4618) * Supports serilizing external OpenSearch UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Correct subfield access logical when calling ITEM Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Resolve types of generated structs based on their values because their types are UNDEFINED Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add explain and integration tests for geoip Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> (cherry picked from commit a003e8c) * Update plan for testGeoIpPushedInAgg Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Mask position of fields to fix an undeterministic position of a field in the result plan for java21 Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
expani
pushed a commit
to vinaykpud/sql
that referenced
this pull request
Nov 4, 2025
…arch-project#4618) * Supports serilizing external OpenSearch UDFs Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Correct subfield access logical when calling ITEM Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Resolve types of generated structs based on their values because their types are UNDEFINED Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> * Add explain and integration tests for geoip Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> --------- Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport 2.19-dev
backport-failed
backport-manually
Filed a PR to backport manually.
bug
Something isn't working
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Previously, externally registered OpenSearch UDFs can not be serialized as they are not registered in
RelJsonSerializer. This PR collects these UDFs in aSqlOperatorTableand registers it inRelJsonSerializer.Blocker
Blocked by #2813 (or potentially other issues that restrict groupping by struct fields)
UDF is serialized, but grouping by a generated struct seems to be problematic after pushdown.
source=weblogs | where host='1.2.3.4' | eval info = geoip('my-datasource', host) | stats count() by info:{ "error": { "reason": "There was internal problem at backend", "details": "java.sql.SQLException: exception while executing query: class java.lang.String cannot be cast to class java.util.Map (java.lang.String and java.util.Map are in module java.base of loader 'bootstrap')", "type": "RuntimeException" }, "status": 500 }In this case, the result map is converted to a string when used as a group key.
Directly running the DSL with the script gives the bucket key as string
Query:
{"from":0,"size":0,"timeout":"1m","query":{"term":{"host":{"value":"1.2.3.4","boost":1.0}}},"_source":{"includes":["host"],"excludes":[]},"aggregations":{"composite_buckets":{"composite":{"size":10000,"sources":[{"info":{"terms":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXNyABFqYXZhLnV0aWwuQ29sbFNlcleOq7Y6G6gRAwABSQADdGFneHAAAAADdwQAAAAGdAAHcm93VHlwZXQAknsKICAiZmllbGRzIjogWwogICAgewogICAgICAidWR0IjogIkVYUFJfSVAiLAogICAgICAidHlwZSI6ICJPVEhFUiIsCiAgICAgICJudWxsYWJsZSI6IHRydWUsCiAgICAgICJuYW1lIjogImhvc3QiCiAgICB9CiAgXSwKICAibnVsbGFibGUiOiBmYWxzZQp9dAAEZXhwcnQCw3sKICAib3AiOiB7CiAgICAibmFtZSI6ICJHRU9JUCIsCiAgICAia2luZCI6ICJPVEhFUl9GVU5DVElPTiIsCiAgICAic3ludGF4IjogIkZVTkNUSU9OIgogIH0sCiAgIm9wZXJhbmRzIjogWwogICAgewogICAgICAibGl0ZXJhbCI6ICJteS1kYXRhc291cmNlIiwKICAgICAgInR5cGUiOiB7CiAgICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICAgIm51bGxhYmxlIjogZmFsc2UsCiAgICAgICAgInByZWNpc2lvbiI6IC0xCiAgICAgIH0KICAgIH0sCiAgICB7CiAgICAgICJpbnB1dCI6IDAsCiAgICAgICJuYW1lIjogIiQwIgogICAgfQogIF0sCiAgImNsYXNzIjogIm9yZy5vcGVuc2VhcmNoLnNxbC5leHByZXNzaW9uLmZ1bmN0aW9uLlVzZXJEZWZpbmVkRnVuY3Rpb25CdWlsZGVyJDEiLAogICJ0eXBlIjogewogICAgInR5cGUiOiAiTUFQIiwKICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgImtleSI6IHsKICAgICAgInR5cGUiOiAiVkFSQ0hBUiIsCiAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAicHJlY2lzaW9uIjogLTEKICAgIH0sCiAgICAidmFsdWUiOiB7CiAgICAgICJ0eXBlIjogIkFOWSIsCiAgICAgICJudWxsYWJsZSI6IGZhbHNlLAogICAgICAicHJlY2lzaW9uIjogLTEsCiAgICAgICJzY2FsZSI6IC0yMTQ3NDgzNjQ4CiAgICB9CiAgfSwKICAiZGV0ZXJtaW5pc3RpYyI6IHRydWUsCiAgImR5bmFtaWMiOiBmYWxzZQp9dAAKZmllbGRUeXBlc3NyABFqYXZhLnV0aWwuSGFzaE1hcAUH2sHDFmDRAwACRgAKbG9hZEZhY3RvckkACXRocmVzaG9sZHhwP0AAAAAAAAx3CAAAABAAAAABdAAEaG9zdH5yAClvcmcub3BlbnNlYXJjaC5zcWwuZGF0YS50eXBlLkV4cHJDb3JlVHlwZQAAAAAAAAAAEgAAeHIADmphdmEubGFuZy5FbnVtAAAAAAAAAAASAAB4cHQAAklQeHg=\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1761646601217183000}},"missing_bucket":true,"missing_order":"first","order":"asc"}}}]}}}}Result:
{ "took": 137, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": null, "hits": [] }, "aggregations": { "composite_buckets": { "after_key": { "info": "{continent_name=Oceania, country_iso_code=AU, country_name=Australia, location=-33.4940,143.2104, time_zone=Australia/Sydney}" }, "buckets": [ { "key": { "info": "{continent_name=Oceania, country_iso_code=AU, country_name=Australia, location=-33.4940,143.2104, time_zone=Australia/Sydney}" }, "doc_count": 1 } ] } } }For reference, the result of `source=weblogs | where host='1.2.3.4' | eval info = geoip('my-datasource', host)` is:
{ "schema": [ { "name": "host", "type": "ip" }, { "name": "method", "type": "string" }, { "name": "bytes", "type": "string" }, { "name": "response", "type": "string" }, { "name": "url", "type": "string" }, { "name": "info", "type": "struct" } ], "datarows": [ [ "1.2.3.4", "GET", "1234", "200", "/history/voyager1/", { "continent_name": "Oceania", "country_iso_code": "AU", "country_name": "Australia", "location": "-33.4940,143.2104", "time_zone": "Australia/Sydney" } ] ], "total": 1, "size": 1 }Workaround
Instead of directly groupping by the struct generated by
geoip, I created tests that group results by sub-fields of result struct ofgeoipfunction. E.g. if the result objectinfo{"str": "a string", "num": 1}, I group results byinfo.str.Related Issues
Resolves #4478
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.