Skip to content

[spark] Support scan mode when query with incremental_between #5392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JackeyLee007
Copy link
Contributor

Purpose

Linked issue: close #xxx

When querying with paimon_incremental_between_timestamp, we want to switch among different scan modes.

  • delta or changelog, if every single change is needed.
  • diff, if merge is needed.

In our micro/small batch operation, executed in hourly, we need to know when a record INSERTEd, and also when it's DELETEd. This needs deleta or changelog mode. If with diff mode, the +I and -D operation could be merged, then we won't get the -D operation.

But when quering the main table, not the audit_log table, the merged result is expected, to just get the INSERTEd and UPDATEd records. So we also need the diff mode.

Tests

API and Format

Documentation

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Apr 3, 2025

Can you try setting spark.paimon.incremental-between-scan-mode = xx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants