Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink: Support nested projection #3991

Closed
wants to merge 1 commit into from
Closed

Conversation

hililiwei
Copy link
Contributor

Supports nested projections in flink.
Add a transform action in RowDataFileScanTaskReade, flatten the nested data obtained from the file based on the field index paths.

@hililiwei
Copy link
Contributor Author

ping @stevenzwu , cloud you please take a look?

@hililiwei hililiwei requested a review from openinx March 8, 2022 06:04
@hililiwei hililiwei force-pushed the nestProject branch 2 times, most recently from a3ecbb5 to fbd0bbb Compare March 8, 2022 15:31
@hililiwei hililiwei marked this pull request as draft March 9, 2022 02:43
@hililiwei hililiwei force-pushed the nestProject branch 2 times, most recently from 68abcf8 to 08e548d Compare March 9, 2022 08:55
@hililiwei hililiwei marked this pull request as ready for review March 9, 2022 09:05
@hililiwei hililiwei force-pushed the nestProject branch 6 times, most recently from 97e62e9 to 44206f2 Compare March 10, 2022 06:54
@hililiwei hililiwei requested a review from openinx March 10, 2022 07:22
@hililiwei hililiwei force-pushed the nestProject branch 2 times, most recently from b966083 to 1805d20 Compare March 29, 2022 07:10
@hililiwei
Copy link
Contributor Author

hililiwei commented Mar 29, 2022

Hi @openinx, according to the feedback from our team's internal trial, I have updated a new version. When you are free, would you please help to have a look? thank you. 😃

@hililiwei hililiwei marked this pull request as draft April 2, 2022 01:24
@github-actions github-actions bot added the API label Apr 13, 2022
@hililiwei hililiwei marked this pull request as ready for review April 13, 2022 13:26
@hililiwei hililiwei requested a review from openinx April 13, 2022 13:26
@@ -208,21 +218,30 @@ public FlinkInputFormat buildFormat() {

if (projectedSchema == null) {
contextBuilder.project(icebergSchema);
} else if (projectedFields != null) {
// Push down the nested projection so that don't need to get extra fields when get data from files.
Schema icebergProjectionSchema = projectSchema(icebergSchema);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push the nesting down so that when try to get data, no longer get extra fields.

// Push down the nested projection so that don't need to get extra fields when get data from files.
Schema icebergProjectionSchema = projectSchema(icebergSchema);
contextBuilder.project(icebergProjectionSchema);
projectedFields = mappingNestProjectedFields(icebergProjectionSchema);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get the mapping between the underlying iceberg nested schema and the Flink source output schema.

return getters[pos].getFieldOrNull(rowData);
return valueCache.computeIfAbsent(pos, key -> {
Object value = null;
if (projectedFields != null) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetch data from iceberg nested struct data according to projectedFields and the field pos in flink output schema

@hililiwei
Copy link
Contributor Author

hi @openinx , could you please take a look when you are available?

@hililiwei hililiwei requested a review from openinx May 23, 2022 13:09
@hililiwei hililiwei force-pushed the nestProject branch 3 times, most recently from f00e08f to 7adb426 Compare August 15, 2022 11:15
@hililiwei
Copy link
Contributor Author

cc @stevenzwu @Fokko @chenjunjiedada @singhpk234 , could you please take a look when you get a chance?

Copy link

github-actions bot commented Aug 7, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 7, 2024
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants