Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-33211][table] support flink table lineage #24618

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

HuangZhenQiu
Copy link
Contributor

What is the purpose of the change

  1. Add Table Lineage Vertex into transformation in planner. The final LineageGraph is generated from transformation and put into StreamGraph. The lineage graph will be published to Lineage Listener in follow up PR.
  2. Deprecated table source and sink are not considered as no enough info can be used for name and namespace for lineage dataset.

Brief change log

  • add table lineage interface and default implementations
  • create lineage vertex and add them to transformation in the phase of physical plan to transformation conversion.

Verifying this change

  1. Add TableLineageGraphTest for both stream and batch.
  2. Added LineageGraph verification in TransformationsTest for legacy sources.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable )

@HuangZhenQiu
Copy link
Contributor Author

@flinkbot run azure

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch 3 times, most recently from 34f3667 to 3a01cfd Compare April 3, 2024 20:18
@flinkbot
Copy link
Collaborator

flinkbot commented Apr 4, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@HuangZhenQiu
Copy link
Contributor Author

@flinkbot run azure

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch from 3a01cfd to 4820faf Compare April 7, 2024 21:55
@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch from 4820faf to 16bb67b Compare April 20, 2024 22:36
@PatrickRen PatrickRen self-requested a review April 23, 2024 04:05
Copy link
Contributor

@PatrickRen PatrickRen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HuangZhenQiu Thanks for the contribution! I left some comments.

Also I found a lot of one-line changes removing blank line in file header. Could you split them to another hotfix commit, or directly revert them as they are not quite necessary?

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch 2 times, most recently from e5f8a17 to 5f29a04 Compare April 29, 2024 04:32
@HuangZhenQiu
Copy link
Contributor Author

@PatrickRen
Thanks for reviewing the RP. For the testing purpose, I only added lineage provider implementation for values related source functions and input format. I will add lineage provider for Hive in a separate PR.

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch from 5f29a04 to 375fe2d Compare April 29, 2024 04:49
@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch from 375fe2d to c8aa9ee Compare April 29, 2024 18:03
@HuangZhenQiu
Copy link
Contributor Author

@davidradl
Thanks for reviewing this PR. This PR is mainly to handle with source/sink level lineage, column level lineage will be need a further discussion in community. Resolved most of your comments.

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch 2 times, most recently from 47ec379 to 81da89d Compare May 6, 2024 17:46
@HuangZhenQiu
Copy link
Contributor Author

@PatrickRen
I have removed schema facet and config facets, given these info are already provided by CatalogBaseTable. It greatly reduced the size of the PR. Would you please take one more round of review?

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch from 5188f87 to edac6c7 Compare June 9, 2024 01:08
@FangYongs FangYongs self-requested a review June 17, 2024 08:06
Copy link
Contributor

@X-czh X-czh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HuangZhenQiu Thanks for the PR! I've left a few comments, PTAL

@HuangZhenQiu HuangZhenQiu force-pushed the support-table-lineage branch from edac6c7 to 2e1322b Compare June 27, 2024 05:46
@HuangZhenQiu
Copy link
Contributor Author

@X-czh Thanks for giving feedback. I removed unused ColumnLineage Interface, and also refined the code as you suggested.

@X-czh
Copy link
Contributor

X-czh commented Jun 30, 2024

@HuangZhenQiu Hi, it'll be better not to squash your commits before the review is finished, so that reviewers can easily track what has changed since the last review.

@X-czh
Copy link
Contributor

X-czh commented Jun 30, 2024

@HuangZhenQiu LGTM. @FangYongs Could you help take a final look and merge it?

@FangYongs
Copy link
Contributor

Thanks @HuangZhenQiu , +1

@FangYongs FangYongs merged commit 960363c into apache:master Jul 3, 2024
XComp added a commit that referenced this pull request Jul 3, 2024
superdiaodiao pushed a commit to superdiaodiao/flink that referenced this pull request Jul 4, 2024
superdiaodiao pushed a commit to superdiaodiao/flink that referenced this pull request Jul 4, 2024
snuyanzin pushed a commit to snuyanzin/flink that referenced this pull request Jul 21, 2024
snuyanzin pushed a commit to snuyanzin/flink that referenced this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants