-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix score count validation in reranker response #111212
Merged
demjened
merged 14 commits into
elastic:main
from
demjened:demjened/fix-check-rerank-score-vs-input
Jul 29, 2024
Merged
Changes from 8 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
a9805ae
Fix rerank score validation
demjened fb5a671
Update docs/changelog/111212.yaml
demjened 7e3f153
Add test case for invalid document indices in reranker result
demjened 7aa9a63
Preemptive top_n config check
demjened 58a032c
Reorg code + refine tests
demjened c4d4d97
Add support for Google Vertex AI task settings
demjened f30311d
Spotless
demjened 4edc52e
Make top N eval async
demjened 6ea9eb7
Update test
demjened 7b91518
Fix broken unit test
demjened 30ed86b
Clean up tests
demjened be485ac
Spotless
demjened ca032ee
Add size check + compare against rankWindowSize
demjened 71bc9bf
Fix import
demjened File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add a test that throws this exception instead of the IOOB ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent @pmpailis I added a test case for simulating invalid indices (7e3f153).
Note I don't think it actually covers the use case, because it's a sub-case of the reranker input/output count mismatch. With the bugfix this is now caught before the actual doc index assignment happens that would trigger the IOOB. (We do not have specific handling of N inputs -> N outputs -> index pointing outside 0..N-1, I'd consider that a reranker error.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@demjened AH, ok, is there a way to parse this
top_n
setting and validate that it matches the window size earlier in the request? That seems like we should return "Bad Request" in that scenario.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent Not without issuing a GET call before each inference call to check the inference endpoint's configuration, I'm afraid. The rerank retriever framework only exposes hooks for creation and submission of an inference request, and parsing the results, but it doesn't know about the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fix this. If there are things that are invalid for a configuration & a search request, we need to fix this.
Calling a local GET to retrieve some minor documentation is way cheaper than calling some external service when the we know the request will fail. I would hold off on merging or progressing this until we can eagerly validate the search request as we know ahead of time that it won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent @Mikep86 @pmpailis I'll summarize the latest changes here that accommodates all your suggestions (thanks for those):
task_settings.top_n < rank_window_size
and preemptively fail before running the inference, 2. the reranker actually returns <>N output on N inputs. These are tested intestRerankTopNConfigurationAndRankWindowSizeMismatch()
andtestRerankInputSizeAndInferenceResultsMismatch()
respectively.top_n
from the fetched inference endpoint config is vendor-specific. I'm not happy about this, but the emptyTaskSettings
interface doesn't give me much options to get the top N other than casting after a type check.my-inference-id-task-settings-top-3 -> top_n=3
). Again I'm not proud of this implementation, butGetInferenceModelAction
has a very limited interface to pass control parameters in in order to mock behavior in a test.Let me know if you feel there are still things to improve and must go in this PR; I want to timebox the remaining effort as the bug has been fixed and I want to make sure this gets merged by 8.15.0 BC5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up that @demjened & I discussed offline yesterday and we're going to investigate doing the
task_settings.top_n < rank_window_size
check in ML codeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mikep86 I've been thinking about this; while moving this logic to the ML code would make it cleaner and would remove coupling, I'm worried it's not a good idea from a product perspective.
For reference, here's the code you suggested that would replace the preemptive top_n check from the reranker:
The problem is we're invalidating a normal use case. It is a valid scenario that a rerank endpoint is configured to return a maximum of N hits for >N inputs. It's only an issue from the reranker retriever's perspective, where the a
model.top_k < retriever.rank_window_size
can lead to partial or incorrect results.So we have 3 options to implement the check:
XxxService
-s in ML code. Problem: invalidating valid use case.doInfer(..., inputSizeMustMatchTopN)
+ the code above? Problem: complex, requires refactoring of the interface and transport objects.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is valid for the the reranker API to return fewer docs as that is the whole point of
topN
. Otherwise, why is it even configurable?The retriever also sets window size. For now, we should enforce that window size is <= topN. Maybe we can adjust this in the future, but for retrievers, this is a sane default & safe behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, wasn't thinking of rerank endpoint usage outside of this context.