Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

sezruby
Copy link
Collaborator

@sezruby sezruby commented Sep 11, 2020

What changes were proposed in this pull request?

Why are the changes needed?

The current signature design is not appropriate for hybrid scan.
For delete support, we need to figure out the list of deleted source files and check the other source files still have the same metadata when index build time or not.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test

@sezruby sezruby changed the title Implement getCandidateIndex for hybrid scan Modify getCandidateIndex for hybrid scan Sep 11, 2020
@rapoth rapoth requested review from apoorvedave1 and pirz and removed request for apoorvedave1 September 11, 2020 22:48
files.exists(entry.allSourceFileInfos.contains)
}

// TODO: the following check only considers indexes in ACTIVE state for usage. Update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there is already an issue for this. If so, can we just link to the issue containing the description. @apoorvedave1 to confirm if there is an issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems #65; I'll add the link here.

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments, but looking fine to me.

def getCandidateIndexes(
indexManager: IndexManager,
plan: LogicalPlan,
hybridScanEnabled: Boolean = false): Seq[IndexLogEntry] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we remove default value? Rationale: this is an user-facing config, which is set to false by default. If we later change the default to true, we need to remember to update this to true as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's temporary for this PR - I'll remove the default value assignment & use the config value instead in the next pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. In that case, please leave a review comment on your intention. That helps reviewers.

def getCandidateIndexes(indexManager: IndexManager, plan: LogicalPlan): Seq[IndexLogEntry] = {
def getCandidateIndexes(
indexManager: IndexManager,
plan: LogicalPlan,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this plan is always the result of RuleUtils.getLogicalRelation meaning this plan is expected to be in a fixed shape (linear, etc.). What would be the best way to address this? cc: @apoorvedave1 / @pirz

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @rapoth I modified #160 a bit for this, but I think we need a new uber-issue for "arbitrary source plan" though it will be handled later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Can you also open an uber issue for that when you get a chance please? (even if it is a work-in-progress issue, let's just capture all the work as we know so far)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll 👍

Hdfs(Properties(Content(Directory("/")))),
"schema",
"format",
Map())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: #142 (comment)

Copy link
Collaborator Author

@sezruby sezruby Sep 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required - empty root directory throws an exception during the test.

@sezruby sezruby requested a review from imback82 September 15, 2020 00:44
@sezruby
Copy link
Collaborator Author

sezruby commented Sep 15, 2020

@apoorvedave1 @pirz Could you review this PR? Thanks!

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @sezruby!

@imback82 imback82 merged commit b7e70af into microsoft:master Sep 15, 2020
@imback82
Copy link
Contributor

Merged to master. @apoorvedave1 / @pirz Feel free to leave any comments if I missed something. Also, please take a look at #153 (comment). I think we need to follow up on it.

@rapoth
Copy link
Contributor

rapoth commented Sep 15, 2020

Thanks @sezruby! Can you create a follow-up item for #153 (comment) (and put in the necessary documentation) if it does not already exist?

@sezruby sezruby deleted the hybridscan_3candidate branch September 17, 2020 13:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants