Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Jun 6, 2025

Description

Support expand command with Calcite.

Note: expand only explodes nested arrays, not maps.

Related Issues

Resolves #3711

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuancu added 8 commits June 3, 2025 20:01
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu
Copy link
Collaborator Author

yuancu commented Jun 6, 2025

TODOs:

  • Test expanding empty arrays
  • Throw errors when expanding a non-array field (e.g. strings stored in a string field)
  • Add unit test (challenging since nested arrays are not in place in Calcite's test data)
  • Update docs (following that of eventstats)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
LantaoJin
LantaoJin previously approved these changes Jun 9, 2025
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
LantaoJin
LantaoJin previously approved these changes Jun 9, 2025
@LantaoJin LantaoJin added PPL Piped processing language calcite calcite migration releated labels Jun 9, 2025
============
(From 3.1.0)

Use the ``expand`` command on a nested array field to transform a single
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested array -> array? why emphasized nested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the boundary between array and object is blurry in OpenSearch. For example, a string field can store a string or an array of string. But there isn't an array type of string. Array only exists for nested type, where it stores an array of structs.

The current implementation does not support expanding an array of string stored in a string field since it will only read the string field as string, it doesn't know whether it is an array of string or a single string at the time of generating logical plans. That's why I specifically mentioned nested array.


expand <field> [as alias]

* field: The field to be expanded (exploded). Currently only nested arrays are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only nested arrays are supported?

It is not limitation of Expand command. Expand command does not aware of OpenSearch nested data type, right?
Is it because PPL engine map OpenSerach nested data type to PPL Array data type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, expand command does not array of the nested type. This limitation originates from the fact that only nested fields are read as arrays in visitExpand in CalciteRelNodeVisitor. Primitive fields storing arrays are read as primitive types.

schema("age", "bigint"),
schema("id", "bigint"),
schema("address", "struct"));
verifyDataRows(
Copy link
Collaborator

@penghuo penghuo Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test should be specific, only focus on empty array.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 237 to 238
// The element type of struct and array is currently set to ANY.
// We set them using the runtime type as a workaround.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The element type of struct and array is currently set to ANY.

It is short-term solution, right? if so, add TODO, and issue link?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Added. Thanks for pointing this out.

// be used by the right side to correlate with the left side.
// Using left join to keep the records where the array field is empty. The corresponding
// field in the result will be null.
relBuilder.correlate(JoinRelType.LEFT, correlVariable.get().id, List.of(arrayFieldRex));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue to track enforce limitation on Expand command.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
yuancu added 3 commits June 10, 2025 09:26
…orkaround

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu
Copy link
Collaborator Author

yuancu commented Jun 10, 2025

Just to confirm: when the field to expand is an empty array, should I keep it in the result? @LantaoJin @qianheng-aws @penghuo

The current implementation keeps it, with the corresponding expanded value set to null. (This corresponds to left join.)

E.g. for

{"company": "OldEdge", "employees": ["Jack", "Ben"]}
{"company": "NewEdge", "employees": []}

expanding employees results in

{"company": "OldEdge", "employees": "Jack"}
{"company": "OldEdge", "employees": "Ben"}
{"company": "NewEdge", "employees": null}

although technically expanding string arrays is not supported yet

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
penghuo
penghuo previously approved these changes Jun 10, 2025
@LantaoJin LantaoJin merged commit 939dfe7 into opensearch-project:main Jun 10, 2025
22 checks passed
penghuo pushed a commit that referenced this pull request Jun 16, 2025
* Create scalffolds for implementing expand command

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* WIP: Implementing expand command

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add array json and index mapping

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* WIP: Implementing expand command

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Implement a minimal viable version of expand

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Support expand with alias

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add doc for expand

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* WIP: add unit tests for expand command

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove unused logical expand & add test cases

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Use left join in expand to keep documents where their expanded array field is empty

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add unit test for expand command

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update expand doc format

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix: delete test doc with refresh

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Improve expand documentation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix typos

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a version section to expand documentation

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Make expand empty array IT more specific

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a limitation section in expand doc & link a issue tracker for a workaround

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Tweak expand command doc

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Test expand null field

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
@LantaoJin LantaoJin mentioned this pull request Jun 23, 2025
7 tasks
@yuancu yuancu deleted the cmd-expand branch August 25, 2025 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support expand command with Calcite

3 participants