Skip to content

Fix potential premature finish in Delta CDF function #17999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

homar
Copy link
Member

@homar homar commented Jun 21, 2023

Description

Additional context and related issues

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jun 21, 2023
if (deltaLakePageSource.isFinished()) {
return FINISHED;
} else {
return TableFunctionProcessorState.Processed.produced(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry for this

return FINISHED;
if (deltaLakePageSource.isFinished()) {
return FINISHED;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt expects line break before else

}
else {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will never get used to this format :|

@electrum
Copy link
Member

Commit message title is too long: https://cbea.ms/git-commit/

@electrum
Copy link
Member

The commit message should be something like "Fix XXX for Delta Lake".

@homar homar force-pushed the homar/prevent_table_changes_from_skipping_read_data branch from 6b74505 to 48fd353 Compare June 21, 2023 22:25
@homar homar changed the title PageSource.getNextPage may return null and still be able to read more data Fix potential premature finish in Delta CDF function Jun 21, 2023
@github-actions github-actions bot added the delta-lake Delta Lake connector label Jun 21, 2023
@homar homar requested review from alexjo2144 and findepi June 21, 2023 22:57
while (!deltaLakePageSource.isFinished()) {
Page page = deltaLakePageSource.getNextPage();
if (page == null) {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind CPS.getNextPage returning nullable is to break procesisng producing a page takes too long.

I think we should yield here. There are 3 ways we can go about it

  • returning TableFunctionProcessorState.Blocked.blocked(NOT_BLOCKED)
  • returning TableFunctionProcessorState.Processed.produced(EMPTY_PAGE)
  • introducing new TableFunctionProcessorState.Yielded

let's go with TableFunctionProcessorState.Processed.produced(EMPTY_PAGE) for a quick fix
and let's follow up with a PR introducing TableFunctionProcessorState.Yielded state

@homar homar force-pushed the homar/prevent_table_changes_from_skipping_read_data branch from 48fd353 to 5220eb9 Compare June 22, 2023 09:32
@homar homar requested a review from findepi June 22, 2023 09:33
if (deltaLakePageSource.isFinished()) {
return FINISHED;
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant else

if (deltaLakePageSource.isFinished()) {
return FINISHED;
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant else

@findepi
Copy link
Member

findepi commented Jun 22, 2023

i tried to understand whether deltaLakePageSource.getNextPage() can actually return a null.
I.e. whether we're dealing with a user-facing correctness bug.

this led me to this question: https://github.com/trinodb/trino/pull/13246/files#r1239083632
cc @raunaqmorarka

@homar homar force-pushed the homar/prevent_table_changes_from_skipping_read_data branch from 5220eb9 to 9ddb23b Compare June 23, 2023 16:50
@findepi findepi merged commit 274be3a into trinodb:master Jul 3, 2023
@github-actions github-actions bot added this to the 421 milestone Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

4 participants