-
Notifications
You must be signed in to change notification settings - Fork 115
Modify getCandidateIndex for hybrid scan #153
Conversation
- only applicable without threshold
files.exists(entry.allSourceFileInfos.contains) | ||
} | ||
|
||
// TODO: the following check only considers indexes in ACTIVE state for usage. Update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember there is already an issue for this. If so, can we just link to the issue containing the description. @apoorvedave1 to confirm if there is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems #65; I'll add the link here.
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few minor comments, but looking fine to me.
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
def getCandidateIndexes( | ||
indexManager: IndexManager, | ||
plan: LogicalPlan, | ||
hybridScanEnabled: Boolean = false): Seq[IndexLogEntry] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we remove default value? Rationale: this is an user-facing config, which is set to false by default. If we later change the default to true, we need to remember to update this to true as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's temporary for this PR - I'll remove the default value assignment & use the config value instead in the next pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. In that case, please leave a review comment on your intention. That helps reviewers.
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
def getCandidateIndexes(indexManager: IndexManager, plan: LogicalPlan): Seq[IndexLogEntry] = { | ||
def getCandidateIndexes( | ||
indexManager: IndexManager, | ||
plan: LogicalPlan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this plan
is always the result of RuleUtils.getLogicalRelation
meaning this plan
is expected to be in a fixed shape (linear, etc.). What would be the best way to address this? cc: @apoorvedave1 / @pirz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Can you also open an uber issue for that when you get a chance please? (even if it is a work-in-progress issue, let's just capture all the work as we know so far)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'll 👍
Hdfs(Properties(Content(Directory("/")))), | ||
"schema", | ||
"format", | ||
Map())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: #142 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required - empty root directory throws an exception during the test.
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/hyperspace/index/rules/RuleUtils.scala
Outdated
Show resolved
Hide resolved
@apoorvedave1 @pirz Could you review this PR? Thanks! |
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/hyperspace/index/rules/RuleUtilsTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @sezruby!
Merged to master. @apoorvedave1 / @pirz Feel free to leave any comments if I missed something. Also, please take a look at #153 (comment). I think we need to follow up on it. |
Thanks @sezruby! Can you create a follow-up item for #153 (comment) (and put in the necessary documentation) if it does not already exist? |
What changes were proposed in this pull request?
lazy val allSourceFileInfos: Set[FileInfo]
lazy val
because this source file list of an index referred several times when applying Rules.Set
instead ofSeq
becauseisHybridScanCandidate()
Why are the changes needed?
The current signature design is not appropriate for hybrid scan.
For delete support, we need to figure out the list of deleted source files and check the other source files still have the same metadata when index build time or not.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test