Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] When spark writes data to the paimon table, data is lost due to some task retries #4831

Closed
1 of 2 tasks
xyk0930 opened this issue Jan 3, 2025 · 2 comments
Closed
1 of 2 tasks
Labels
bug Something isn't working

Comments

@xyk0930
Copy link

xyk0930 commented Jan 3, 2025

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.9

Compute Engine

spark3.5.1

Minimal reproduce step

  1. Save data to the paimon table
    dataset.write().mode(mode).format("paimon").save(path);
  2. Perform to the stage (collect at PaimonSparkWriter. Scala: 195), Some nodes are lost. Try again
    image
    image
    image
  3. The amount of data written by the two retries is different from that of the final query
    image
    image
    9314203 + 6211188 = 15525391
    But the amount of data queried from the paimon table is 15476552
    image

What doesn't meet your expectations?

When I increased execu's memory, the task did not retry and ended up writing 15,525,244 pieces of data. I guess the possible reason is that the task retry will overwrite the file written the first time, or some other possibility

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@xyk0930 xyk0930 added the bug Something isn't working label Jan 3, 2025
@xyk0930
Copy link
Author

xyk0930 commented Jan 3, 2025

step 1 the save mode is overwrite

@xyk0930
Copy link
Author

xyk0930 commented Jan 6, 2025

use spark checkpoint can solve this problem

@xyk0930 xyk0930 closed this as completed Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant