-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Move convertToRawFiles method from TableScan to Split #2361
Conversation
} | ||
|
||
@Override | ||
public List<RawFile> rawFiles() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two choices:
Optional<List<RawFile>> convertToRawFiles
.boolean needsMerging()
andList<RawFile> rawFiles()
, if need to merge, rawFiles should throws exception.
Maybe choice first is better.
@@ -89,6 +92,16 @@ public long rowCount() { | |||
return rowCount; | |||
} | |||
|
|||
@Override | |||
public boolean needsMerging() { | |||
return rawFiles.isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, thinking about supporting chang-log tables for SR or Spark native engine, needsMerge
should not depend on whether the list of rawFile
is empty. We can depend on whether table is a change-log table or an append-on table for now. Further, even this's a change-log table, It should be judged by whether these documents overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, we just introduce method to Split
, this is a common method, just for non-merging reading, we can try to add more to cover more cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
It would be more natural to check if a split can be read without merging directly from that split, not from other sources.
This PR moves
convertToRawFiles
method fromTableScan
toSplit
.Tests
This PR is a refactor. Existing tests should cover the correctness.
API and Format
No.
Documentation
No.