API: Introduce a new IncrementalAppendScan interface #4580

stevenzwu · 2022-04-18T16:26:51Z

During review of Flink FLIP-27 source PR #4329 , we agreed that the streaming start strategy should be inclusive. Hence we would need the TableScan#appendsBetween to support nullable fromSnapshotId. Right now, fromSnapshotId is a primitive type of long.
#4329 (comment)

Initially, @rdblue and I were thinking just overload appendsBetween with a Long fromSnapshotId. That would cause compiling error due to ambiguity of type resolution. As we want to maintain binary backward compatibility, I tried to add a new method name appendsInRange in PR #4529 , which is not as an intuitive name as appendsBetween.

This is a different direction with PR #4529 that is suggested by @rdblue. Instead of modifying TableScan interface, maybe we can consider introduce a new IncrementalTableScan interface and Table#newIncrementalScan method.

We can decouple regular TableScan and IncrementalTableScan. This also help avoids the need of UnsupportedOperationException for some TableScan methods
We may have more complex incremental scan in the future like the CDC data that can be added to the new IncrementalTableScan API

To avoid code duplication, a new super interface of Scan<T extends Scan> was extracted as the parent of the current TableScan and the new IncrementalTableScan. I ran japi-compliance-checker too check binary compatibility after the interface refactoring. Will attach the result

stevenzwu · 2022-04-18T16:36:01Z

Here is the japi-compliance-checker report for iceberg-api jar before and after this change

api-compatibility-report.pdf

stevenzwu · 2022-04-18T17:07:11Z

@rdblue @openinx @yittg @aokolnychyi @flyrain @flashJd can you please take a look at this approach and see if it makes sense?

stevenzwu · 2022-04-18T17:55:41Z

api/src/main/java/org/apache/iceberg/TableScan.java

@@ -107,24 +68,6 @@ default TableScan select(String... columns) {
    return select(Lists.newArrayList(columns));


couldn't move this select method to the Scan interface, as API compatibility check tool shows that the return type of TableScan is changed to Scan probably due to varargs.

stevenzwu · 2022-04-18T20:52:17Z

cc @hameizi the author of PR #3095

stevenzwu · 2022-04-19T00:27:11Z

api/src/main/java/org/apache/iceberg/IncrementalTableScan.java

+   *
+   * Default behavior for incremental scan fails if there are overwrite operations in the incremental snapshot range
+   */
+  IncrementalTableScan ignoreOverwrites();


For V2 tables, incremental scan is potentially interested in append, overwrite, delete operations. append is always included. I guess we need to discuss what is the natural API to control overwrite and delete.

this API of ignoreOverwrites might be too restrictive. then we also might need to expose an includeOverwrites. I am wondering we should api like

include(DataOperations... operations) // default is append only. for CDC read, overwrite and delete can be added fail(DataOperations... operations) // default is fail nothing

or maybe we should just need the appendsOnly API above. otherwise, overwrite and delete snapshots are also included automatically for V2 tables.

or maybe we should just need the appendsOnly API above. otherwise, overwrite and delete snapshots are also included automatically for V2 tables.

I agree. Because i think for V2 table include overwrite and delete snapshots should be a default behavior.

For the logical completeness of filtering snapshot type, should we have the following 3 methods?

ignoreAppend

ignoreOverwrite

ignoreDelete

And the appendsOnly is basically a combination of ignoreOverwrite and ignoreDelete

agree the 3 ignore methods are complete and flexible? wondering if we need the flexibility though. E.g., does append + overwrite or overwrite + delete ever make sense? Do we only need two modes: (1) appendsOnly (2) append + overwrite + delete).

Do we need fail APIs (like failOverwrite)? My current take is no. Let's just skip the snapshots (not interested).

Yes, it is flexible. append + overwrite makes sense for user who only want to get inserted rows with some additional filtering. overwrite + delete makes sense for getting only deleted rows.
I'm not aware of a use case with failOverwrite. We may skip it now.

How about dataBetween(Long startSnapshotId, long endSnapshot, List<RowKind> rowKinds)? then we could produce exact data according to the given row kinds.

@chenjunjiedada RowKind is a Flink API.

Based on the conversation, it seems most people prefer the fluent style API for the scan builder, like fromSnapshotId(long fromSnapshotId)

I meant we could borrow the RowKind definition of the produced data. Like what @flyrain mentioned before. We could use +I to target the append + overwrite with some filter, use -D, -U to target delete and some data in overwrite.

api/src/main/java/org/apache/iceberg/Scan.java

openinx · 2022-04-19T04:42:21Z

api/src/main/java/org/apache/iceberg/IncrementalTableScan.java

+   * @param fromSnapshotId the start snapshot id (exclusive)
+   * @return an incremental table scan from {@code fromSnapshotId} exclusive
+   */
+  IncrementalTableScan fromSnapshotId(long fromSnapshotId);


Will it be more generic to add a inclusive flag in this fromSnapshotId method ? (So that we can meet the requirement for both including & excluding fromSnapshotId incremental scan )

agree. Base on this, is there should add method like useSnapshot to process just one snapshot. I think it's useful for read iceberg table in streaming mode.

actually, the fromSnapshotId is exclusive behavior (from, to] for the incremental scan, as the toSnapshotId will become the fromSnapshotId` in the next scan.

For the inclusive behavior, we were mainly talking about the starting strategy. E.g., if we said a specific start snapshot id, we want to include the files in this snapshot (if append). To support that, we need the incremental scan to support nullable fromSnapshotId, as we will just pass in the parent snapshot id (which can be null). That wasn't possible with the TableScan#appendsBetween(long fromSnapshotId, long toSnapshotId).

If we want to add exclusive behavior, then we should add alternative methods like afterSnapshotId. To me, fromSnapshotId should be inclusive of the snapshot that is identified. We can come up with better names for these, like fromSnapshotInclusive and fromSnapshotExclusive if you like those better.

api/src/main/java/org/apache/iceberg/IncrementalTableScan.java

api/src/main/java/org/apache/iceberg/Table.java

openinx · 2022-04-19T04:48:17Z

api/src/main/java/org/apache/iceberg/TableScan.java

@@ -148,7 +89,9 @@ default TableScan select(String... columns) {
   * @return a table scan which can read append data from {@code fromSnapshotId}
   * exclusive and up to {@code toSnapshotId} inclusive
   */
-  TableScan appendsBetween(long fromSnapshotId, long toSnapshotId);
+  default TableScan appendsBetween(long fromSnapshotId, long toSnapshotId) {
+    throw new UnsupportedOperationException("Incremental scan is not supported");


We will need to keep this implementation for at least one minor release ?

According to the apache project compatibility rules..

this change is not strictly required for this change. I am ok to revert it. I added it here as I was thinking about removing the duplicate code of UnsupportedOperationException in many TableScan impl classes.

We can mark those two appends methods as deprecated once the new IncrementalScan impls are ready. Yes, we can follow the compatibility rules.

flyrain · 2022-04-19T20:43:14Z

Thanks @stevenzwu for the PR. I’m OK with the change, but I doubt if CDC can use the the interface IncrementalTableScan. Basically CDC requires much finer control of planning, check my CDC PR(#4539) for more details. We can keep evolving on the interface IncrementalTableScan to make it suitable for cdc in the future. It is hard to connect them at this moment. We may focus on the incremental scan itself in this PR.

stevenzwu · 2022-04-19T23:04:11Z

@flyrain This is just a starting point. I am sure the current IncrementalScan interface is NOT good for the CDC read today, which needs more complex planning control. That was also part of the motivation when Ryan suggested. Can CDC read leverage this IncrementalScan interface in the future once enhanced? Will this direction work for CDC read?

chenjunjiedada · 2022-04-22T03:44:47Z

api/src/main/java/org/apache/iceberg/IncrementalScan.java

+   *
+   * Default behavior for incremental scan fails if there are overwrite operations in the incremental snapshot range
+   */
+  IncrementalScan ignoreOverwrites();


Maybe we could add another option IncrementalScan rowKinds(RowKind... rowKinds) to support CDC case.

RowKind is a Flink API. we can't use it here.

Can we borrow the definition of data from Flink side? The incremental scan actual target that four kinds of data, right?

rdblue · 2022-04-22T16:16:33Z

api/src/main/java/org/apache/iceberg/TableScan.java

@@ -148,7 +89,9 @@ default TableScan select(String... columns) {
   * @return a table scan which can read append data from {@code fromSnapshotId}
   * exclusive and up to {@code toSnapshotId} inclusive
   */
-  TableScan appendsBetween(long fromSnapshotId, long toSnapshotId);


I think we should probably deprecate this because we want people to move to incremental. (Eventually)

yes, once new IncrementAppendScan is implemented, we can mark these two appends methods as deprecated

rdblue · 2022-04-22T16:20:20Z

api/src/main/java/org/apache/iceberg/IncrementalScan.java

+  /**
+   * Only interested in snapshots with append operation
+   */
+  IncrementalScan appendsOnly();


As we talked about, I think it makes sense to remove these two methods since the default for scanning appends is to ignore deletes and overwrites and to read only append snapshots.

rdblue · 2022-04-22T16:24:01Z

api/src/main/java/org/apache/iceberg/IncrementalScan.java

+/**
+ * API for configuring an incremental table scan
+ */
+public interface IncrementalScan extends Scan<IncrementalScan> {


IncrementalAppendScan?

stevenzwu · 2022-04-22T19:05:41Z

api/src/main/java/org/apache/iceberg/Table.java

+   *
+   * @return an incremental scan for appends only snapshots
+   */
+  default IncrementalAppendScan newIncrementalAppendScan() {


@flyrain @aokolnychyi @openinx @chenjunjiedada @yittg @hameizi After discussing with @rdblue, we think it is probably cleaner to have separate newScan method for appends only and CDC read.

In the future, we can add Table#newIncrementalChangelogScan and IncrementalChangelogScan interface

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java

Reo-LEI · 2022-04-25T07:14:27Z

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java

+/**
+ * API for configuring an incremental table scan for appends only snapshots
+ */
+public interface IncrementalAppendScan extends Scan<IncrementalAppendScan> {


I think this can be abstracted into IncrementalScan, because not only append-only, changelog incremental scan also needs to specify from and to snapshots. We can define the IncrementalScan interface, and then return different concrete implementations in Table through different methods, for example:

public interface IncrementalScan extends Scan<IncrementalScan> {...} abstract class BaseIncrementalScan implements IncrementalScan {...} public class AppendOnlyIncrementalScan extends BaseIncrementalScan {...} public class ChangelogIncrementalScan extends BaseIncrementalScan {...} public interface Table { ... IncrementalScan newAppendIncrementalScan(); IncrementalScan newChangelogIncrementalScan(); ... }

@Reo-LEI conceptually I agree with you. @rdblue prefer to do the refactoring when we come to the changelog incremental scan.

The reason we didn't add BaseIncrementalScan is that right now there is no difference between BaseIncrementalScan and IncrementalAppendScan. In the future, we can extract BaseIncrementalScan out or we can have IncrementalChangelogScan extending IncrementalAppendScan. Personally, I also prefer the BaseIncrementalScan.

rdblue · 2022-04-29T16:08:10Z

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java

+  /**
+   * Optional. if from snapshot id (inclusive or exclusive) is not provided,
+   * the oldest ancestor of the {@link IncrementalAppendScan#toSnapshot(long)}
+   * will be included as the from snapshot.


Javadoc should start with a brief description of the method, then follow that with new paragraphs explaining more about the method's behavior. That's because Javadoc is going to pull out the first part as the description and the rest is available when you navigate to the method details.

rdblue · 2022-04-29T16:11:05Z

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java

+  IncrementalAppendScan fromSnapshotExclusive(long fromSnapshotId);
+
+  /**
+   * Required


I think this needs a short Javadoc description.

rdblue · 2022-04-29T18:58:02Z

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java

+   * <p>
+   * If the start snapshot (inclusive or exclusive) is not provided,
+   * the oldest ancestor of the {@link IncrementalAppendScan#toSnapshot(long)}
+   * will be included as the start snapshot.


@stevenzwu, this isn't true. If the starting snapshot is not set, then it defaults to null, which will scan from the start of table history or fail if table history has expired.

@rdblue yes, it defaults to null. The described behavior is based on IncrementalDataTableScan using SnapshotUtil.ancestorsBetween, which scan from the oldest ancestor of the toSnapshotId. I think the current behavior makes sense. Start of table history may not be an ancestor of the toSnapshotId, right?

I don't think the proposed behavior makes sense for this interface. Otherwise, there is no way to incrementally scan from the start of the table. If I want to start from the beginning of history, I need to specify starting snapshot null. But there's no way to do that without leaving out the "from" snapshot. If that's how to configure scanning from the start of history, then this can't scan from the oldest known snapshot by default.

if the oldest table snapshot is not an ancestor of the toSnapshot, what does it mean? I thought incremental scan is only meaningful along the linear ancestor line.

E.g., we have two disjointed lineages.
S1 -> S2 ----------> S5
S3 -> S4 -> S6 -> S7

If the toSnapshotId is set to S7 and fromSnapshotId is not set, I thought we want to scan [S3, S4, S6, S7]. Is that correct?

The problem is when the history has expired, not when there is no ancestor relationship. When the starting point is not an ancestor, that's a different problem that results in an exception.

Just to make sure I understand you correctly. if fromSnapshotId is not set and defaults to null, we want start from the snapshot with the oldest timestamp using this Table API: Iterable<Snapshot> snapshots(). if the oldest snapshot is not an ancestor of the current table snapshot, we throw an exception.

I assume we don't want to use this Table API to find out the oldest snapshot (by timestamp)

List<HistoryEntry> history()

If no fromSnapshot method is called, the incremental read should start from the beginning of table history, the null snapshot. So we need to find a snapshot with parent-snapshot-id=null.

if the table history has disjointed lineage lines, then we can have multiple snapshots with parent-snapshot-id=null. I guess then we can use timestamp to break the tie.

@rdblue I updated the Javadoc based on the discussion here.

@rdblue can you take another look and see if the comments are addressed adequately.

rdblue · 2022-05-03T03:14:23Z

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java

+   * Refine the incremental scan with the start snapshot inclusive.
+   * <p>
+   * If the start snapshot (inclusive or exclusive) is not provided,
+   * the oldest snapshot will be used as the start snapshot.


I don't think this statement is clear enough. The table's first snapshot is used. You could argue that's the "oldest" but I think it is better to be clear that you're asking to process snapshots back to the start of the table. Clarifying on the next line helps, but it exposes an internal detail about how we track history: a snapshot with no parent is the starting snapshot.

Can you see if the latest change is more clear?

github-actions bot added the API label Apr 18, 2022

stevenzwu changed the title ~~Core: Introduce a new IncrementalTableScan interface~~ API: Introduce a new IncrementalTableScan interface Apr 18, 2022

stevenzwu commented Apr 18, 2022

View reviewed changes

stevenzwu mentioned this pull request Apr 18, 2022

iceberg v2 table changelog streaming read feature support #4528

Closed

stevenzwu commented Apr 19, 2022

View reviewed changes

yittg reviewed Apr 19, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/Scan.java Outdated Show resolved Hide resolved

api/src/main/java/org/apache/iceberg/Scan.java Show resolved Hide resolved

hameizi reviewed Apr 19, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/Scan.java Show resolved Hide resolved

openinx reviewed Apr 19, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/Scan.java Outdated Show resolved Hide resolved

openinx reviewed Apr 19, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/IncrementalTableScan.java Outdated Show resolved Hide resolved

openinx reviewed Apr 19, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/Table.java Outdated Show resolved Hide resolved

openinx reviewed Apr 19, 2022

View reviewed changes

hameizi mentioned this pull request Apr 19, 2022

Flink: flink read iceberg upsert data use streaming mode #3095

Closed

stevenzwu force-pushed the refactorScanAPI branch from 26c523b to a120456 Compare April 19, 2022 23:04

stevenzwu mentioned this pull request Apr 21, 2022

Add a new TableScan#appendInRange method that accepts nullable fromSn… #4529

Closed

chenjunjiedada reviewed Apr 22, 2022

View reviewed changes

stevenzwu mentioned this pull request Apr 22, 2022

Change Data Capture(CDC)[Draft] #4539

Closed

rdblue reviewed Apr 22, 2022

View reviewed changes

stevenzwu commented Apr 22, 2022

View reviewed changes

api/src/main/java/org/apache/iceberg/IncrementalAppendScan.java Outdated Show resolved Hide resolved

Reo-LEI reviewed Apr 25, 2022

View reviewed changes

rdblue reviewed Apr 29, 2022

View reviewed changes

stevenzwu changed the title ~~API: Introduce a new IncrementalTableScan interface~~ API: Introduce a new IncrementalAppendScan interface Apr 29, 2022

rdblue reviewed May 3, 2022

View reviewed changes

szehon-ho mentioned this pull request May 5, 2022

Spark: Improve performance of expire snapshot by not double-scanning retained Snapshots #3457

Merged

stevenzwu mentioned this pull request May 11, 2022

Core: Incremental append scan impl #4744

Merged

stevenzwu added 10 commits May 10, 2022 19:29

Introduce a new IncrementalTableScan interface

955d650

Move a few methods from TableScan to Scan

88abb9c

make Scan interface package private since it isn't exposed user

6d0def4

rename IncrementalTableScan to IncrementalScan

c16a620

address Ryan's suggestions

87435c8

update Javadoc for IncrementalAppendScan interface

e8dd6c9

Update the Javadoc for the new IncrementalAppendScan interface

a6def14

update Javadoc to explain the behavior if start snapshot is not set

e3d9da9

update Javadoc for oldest snapshot

b53db69

update link to TableScan in Scan's Javadoc

c65b89b

stevenzwu force-pushed the refactorScanAPI branch from 4463382 to c65b89b Compare May 11, 2022 02:29

update Javadoc

511e8bb

rdblue approved these changes May 13, 2022

View reviewed changes

rdblue merged commit beed94d into apache:master May 13, 2022

aokolnychyi mentioned this pull request May 16, 2022

API: Add an action to generate table change set #4708

Closed

stevenzwu deleted the refactorScanAPI branch July 26, 2022 18:01

		@@ -107,24 +68,6 @@ default TableScan select(String... columns) {
		return select(Lists.newArrayList(columns));

API: Introduce a new IncrementalAppendScan interface #4580

API: Introduce a new IncrementalAppendScan interface #4580

Conversation

stevenzwu commented Apr 18, 2022 • edited Loading

stevenzwu commented Apr 18, 2022

stevenzwu commented Apr 18, 2022

Choose a reason for hiding this comment

stevenzwu commented Apr 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenzwu Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenjunjiedada Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyrain commented Apr 19, 2022 • edited Loading

stevenzwu commented Apr 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenzwu commented Apr 18, 2022 •

edited

Loading

stevenzwu Apr 19, 2022 •

edited

Loading

chenjunjiedada Apr 22, 2022 •

edited

Loading

flyrain commented Apr 19, 2022 •

edited

Loading

stevenzwu commented Apr 19, 2022 •

edited

Loading

rdblue Apr 22, 2022 •

edited

Loading