Skip to content

[FLINK-37923][sql] Introduce VARIANT type and PARSE_JSON to Flink SQL #26655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

Sxnan
Copy link
Contributor

@Sxnan Sxnan commented Jun 9, 2025

What is the purpose of the change

This pull request introduces the Variant data structure to represent semi-structured data, a new SQL type variant, and a builtin method to parse json string to the variant type.

Brief change log

  • Patch calcite to support variant type
  • Introduce Variant and BinaryVariant
  • Introduce variant type and PARSE_JSON to Flink SQL

Verifying this change

This change added tests and can be verified.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 9, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@Sxnan Sxnan changed the title [FLINK-37923][sql] Patch calcite to support variant type [FLINK-37923][sql] Introduce VARIANT type and PARSE_JSON to Flink SQL Jun 10, 2025
@Sxnan Sxnan marked this pull request as ready for review June 10, 2025 03:18
Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @Sxnan! I left a couple of comments that should improve consistency when integrating this type into the existing type system.

@Sxnan Sxnan force-pushed the intro-variant branch 5 times, most recently from 316415e to e2adf87 Compare June 12, 2025 14:58
@Sxnan
Copy link
Contributor Author

Sxnan commented Jun 12, 2025

@twalthr Thanks for the detailed review! I updated the PR accordingly, and all the comments should be addressed. Please take another look.

@Sxnan Sxnan requested a review from twalthr June 12, 2025 15:04
@twalthr
Copy link
Contributor

twalthr commented Jun 18, 2025

Thanks @Sxnan. I added it for my list for tomorrow. I'm sure it can still make it before the feature freeze.

@Sxnan
Copy link
Contributor Author

Sxnan commented Jun 20, 2025

Hi @twalthr, could you take another look at your earliest convenience, in case we need to make some final adjustment to the PR before the feature freeze tomorrow.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Sxnan. I think the PR should definitely make it to 2.1 release. I added my last set of comments.


/** Variant represent a semi-structured data. */
@PublicEvolving
public interface Variant extends Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could imagine that users will request it to more easily pass it to constructors of ProcessFunction or having it in member variables of ProcessFunction as well. This is also the reason why Row is Serializable. But this can be a followup. Given all the utilities we have, ensuring serializability should not be too difficult to implement with a custom readObject/writeObject.

@Sxnan
Copy link
Contributor Author

Sxnan commented Jun 20, 2025

Hi @twalthr, thanks for the review! I updated the PR accordingly in the last two fixup commits. Please take another look.

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great feature @Sxnan!

@Sxnan
Copy link
Contributor Author

Sxnan commented Jun 20, 2025

Fixup commits are squashed. Will merge after the test passes.

Copy link
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice feature, @Sxnan 🙂 I somehow didn't spot the docs. Are they covered in another PR?

@Sxnan
Copy link
Contributor Author

Sxnan commented Jun 20, 2025

@gustavodemorais Yes, it will be covered later in another PR.

@Sxnan Sxnan closed this in 2b13a56 Jun 20, 2025
@Sxnan Sxnan deleted the intro-variant branch June 21, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants