Skip to content

Add expansion_type parameter to API call from eland_import_hub_model #802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

daixque
Copy link
Contributor

@daixque daixque commented Jul 22, 2025

Overview

This PR is related to the Elasticsearch change: elastic/elasticsearch#131679

In the above PR, Elasticsearch supports sparse embeddings including non-ELSER models. A significant example of a sparse vector model is the SPLADE model, which is a reference model for ELSER.
To inform Elasticsearch that the target model is for a SPLADE type one, this PR introduces a new expansion_type parameter when it calls the create trained model API.

How Eland detects the SPLADE model

Eland identifies the model as a SPLADE model by checking the dimention of the output tensor. If the second dimension of the output tensor is over 1, it is considered a SPLADE model. This is because SPLADE models typically output embeddings per token, which is different from ELSER.

Related

elif self._task_type == "text_expansion":
sample_embedding = self._traceable_model.sample_output()
if type(sample_embedding) is tuple:
text_embedding = sample_embedding[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename text_embedding to sparse_embedding

Copy link
Contributor Author

@daixque daixque Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current codebase, text_expansion is used in anywhere. (No sparse_embedding)
https://github.com/search?q=repo%3Aelastic%2Feland%20text_expansion&type=code

% grep -inR "text_expansion" eland tests | grep -v "Binary file"
eland/ml/pytorch/transformers.py:76:    "text_expansion",
eland/ml/pytorch/transformers.py:97:    "text_expansion": TextExpansionInferenceOptions,
eland/ml/pytorch/transformers.py:557:        elif self._task_type == "text_expansion":
eland/ml/pytorch/transformers.py:747:        if self._task_type == "text_expansion":
eland/ml/pytorch/nlp_ml_model.py:320:        super().__init__(configuration_type="text_expansion")
tests/ml/pytorch/test_pytorch_model_config_pytest.py:149:            "text_expansion",
tests/ml/pytorch/test_pytorch_model_config_pytest.py:217:            if task_type == "text_expansion":

Should I rename everything? It will cause CLI interface change. Should we keep --task-type=text_expansion for the compatibility? (I feel that renaming should be another PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok thanks. Yes the rename is not necessary in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants