-
Notifications
You must be signed in to change notification settings - Fork 90
Add composite embedders and pooling for hf models #1104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add composite embedders and pooling for hf models #1104
Conversation
9416155
to
f75e2dc
Compare
WalkthroughThe changes add support for composite embedders and pooling strategies for HuggingFace embedders. A new Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Case
participant API as Meilisearch API
participant Index as Index Logic
participant Embedder as CompositeEmbedder
Test->>API: Update embedders with composite config
API->>Index: update_embedders()
Index->>Embedder: Instantiate CompositeEmbedder
Embedder->>Index: Return composite embedder instance
Index->>API: Update complete
Test->>API: Get embedder settings
API->>Index: get_settings()
Index->>Embedder: Retrieve composite embedder
Embedder->>Index: Return composite embedder structure
Index->>API: Return embedder settings
API->>Test: Respond with composite embedder info
Assessment against linked issues
Possibly related PRs
Suggested labels
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/settings/test_settings_embedders.py (1)
194-223
: Well-structured test for composite embeddersThe test correctly uses the
enable_composite_embedders
fixture and properly validates the structure and types of the composite embedder components.Consider these improvements:
- Remove the
print(embedders)
statement on line 214 as it's not needed for the test.- Update the assertion on line 214 from
embedders.embedders["composite"]
toembedders.embedders["default"]
to match the key used when creating the embedder on line 203.- Add a test to verify the
pooling
attribute ofHuggingFaceEmbedder
which was also added in this PR.- print(embedders) - assert embedders.embedders["composite"].source == "composite" + assert embedders.embedders["default"].source == "composite"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
meilisearch/index.py
(4 hunks)meilisearch/models/embedders.py
(5 hunks)tests/conftest.py
(1 hunks)tests/settings/test_settings_embedders.py
(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/conftest.py (1)
meilisearch/_httprequests.py (1)
patch
(99-107)
🔇 Additional comments (9)
tests/conftest.py (1)
277-294
: Implementation looks good!The implementation of the
enable_composite_embedders
fixture follows the established pattern of other experimental feature toggles in this file. It properly enables the feature before yielding control to the test and disables it afterward, using appropriate HTTP PATCH requests with authentication and timeout settings.meilisearch/index.py (4)
35-35
: Good addition to the importsThe
CompositeEmbedder
import has been properly added to the existing imports from themeilisearch.models.embedders
module.
981-982
: Properly implemented condition handling for composite embeddersThe addition of the composite embedder handling in the
get_settings
method follows the existing pattern for other embedder types and is implemented correctly.
940-941
: Consistent implementation for get_embedders methodThe implementation for handling composite embedders in the
get_embedders
method is consistent with the implementation inget_settings
and other embedder types.
1985-1986
: Consistent implementation for update_embedders methodThe implementation for handling composite embedders in the
update_embedders
method matches the pattern established in the other methods and for other embedder types.tests/settings/test_settings_embedders.py (1)
3-9
: Added necessary imports for the new testThe imports have been properly updated to include
pytest
and theCompositeEmbedder
class from the embedders module.meilisearch/models/embedders.py (3)
24-40
: Well-defined enumeration for pooling strategiesThe
PoolingType
enumeration provides a clear set of options for HuggingFace embedder pooling strategies with good documentation.
101-114
: Proper integration of pooling parameter into HuggingFaceEmbedderThe
pooling
parameter has been correctly added to theHuggingFaceEmbedder
class with appropriate typing and documentation.
253-253
: EmbedderType properly updatedThe
EmbedderType
union alias has been correctly updated to include the newCompositeEmbedder
type.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey there! the code looks nice, but the tests are failing, can you check the CI?
Thanks a lot for contributing to Meilisearch! <3
haha just seeing now that a diff suggested by coderabbitai I accepted messed up the class, ultimately my bad but also weird |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/settings/test_settings_embedders.py (1)
225-234
: Replace getattr calls with direct attribute access.The static analysis correctly identifies that
getattr
with constant attribute names is unnecessary and less readable than direct attribute access.- # ensure search_embedder has no document_template - assert getattr(embedders.embedders["composite"].search_embedder, "document_template") is None - assert ( - getattr( - embedders.embedders["composite"].search_embedder, - "document_template_max_bytes", - ) - is None - ) - assert getattr(embedders.embedders["composite"].indexing_embedder, "document_template") + # ensure search_embedder has no document_template + assert embedders.embedders["composite"].search_embedder.document_template is None + assert embedders.embedders["composite"].search_embedder.document_template_max_bytes is None + assert embedders.embedders["composite"].indexing_embedder.document_template🧰 Tools
🪛 Ruff (0.11.9)
226-226: Do not call
getattr
with a constant attribute value. It is not any safer than normal property access.Replace
getattr
with attribute access(B009)
228-231: Do not call
getattr
with a constant attribute value. It is not any safer than normal property access.Replace
getattr
with attribute access(B009)
234-234: Do not call
getattr
with a constant attribute value. It is not any safer than normal property access.Replace
getattr
with attribute access(B009)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
meilisearch/models/embedders.py
(5 hunks)tests/settings/test_settings_embedders.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- meilisearch/models/embedders.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/settings/test_settings_embedders.py (3)
meilisearch/models/embedders.py (5)
CompositeEmbedder
(216-252)HuggingFaceEmbedder
(78-114)OpenAiEmbedder
(42-75)PoolingType
(24-39)UserProvidedEmbedder
(195-213)tests/conftest.py (1)
empty_index
(109-117)meilisearch/index.py (4)
update_embedders
(1950-1997)update
(106-128)wait_for_task
(232-259)get_embedders
(1908-1948)
🪛 Ruff (0.11.9)
tests/settings/test_settings_embedders.py
226-226: Do not call getattr
with a constant attribute value. It is not any safer than normal property access.
Replace getattr
with attribute access
(B009)
228-231: Do not call getattr
with a constant attribute value. It is not any safer than normal property access.
Replace getattr
with attribute access
(B009)
234-234: Do not call getattr
with a constant attribute value. It is not any safer than normal property access.
Replace getattr
with attribute access
(B009)
🔇 Additional comments (3)
tests/settings/test_settings_embedders.py (3)
3-11
: LGTM! Imports are correctly added for new functionality.The new imports support the composite embedder feature and pooling functionality being tested.
197-197
: LGTM! Proper use of pytest fixture for experimental feature.The
enable_composite_embedders
fixture correctly enables the experimental composite embedders feature for this test.
198-224
: LGTM! Comprehensive test structure for composite embedder.The test correctly:
- Creates a composite embedder configuration
- Updates the index and waits for completion
- Verifies the embedder types and structure
- Tests serialization roundtrip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for contributing to Meilisearch <3!
bors merge |
Pull Request
Related issue
Fixes #1099
What does this PR do?
CompositeEmbedder
, addspooling: PoolingOpt
toHuggingFaceEmbedder
s/indexes/{index_uid}/settings/embedders
PR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!
Summary by CodeRabbit
New Features
Tests