Skip to content

Support google/embeddinggemma-300m and Qwen/Qwen3-Reranker-0.6B #642

@AimoreRRD

Description

@AimoreRRD

Model description

google/embeddinggemma-300m

embedding-server  | INFO:     Waiting for application startup.
embedding-server  | INFO     2025-09-18 15:07:20,029 infinity_emb INFO:        infinity_server.py:84
embedding-server  |          Creating 2 engines:                                                    
embedding-server  |          ['google/embeddinggemma-300m',                                         
embedding-server  |          'Qwen/Qwen3-Reranker-0.6B']                                            
embedding-server  | INFO     2025-09-18 15:07:20,031 infinity_emb INFO:              telemetry.py:34
embedding-server  |          DO_NOT_TRACK=1 registered. Anonymized usage statistics                 
embedding-server  |          are disabled.                                                          
embedding-server  | INFO     2025-09-18 15:07:20,034 infinity_emb INFO:           select_model.py:66
embedding-server  |          model=`google/embeddinggemma-300m` selected, using                     
embedding-server  |          engine=`torch` and device=`cuda`                                       
embedding-server  | ERROR:    Traceback (most recent call last):
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1082, in from_pretrained
embedding-server  |     config_class = CONFIG_MAPPING[config_dict["model_type"]]
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 784, in __getitem__
embedding-server  |     raise KeyError(key)
embedding-server  | KeyError: 'gemma3_text'
embedding-server  | 
embedding-server  | During handling of the above exception, another exception occurred:
embedding-server  | 
embedding-server  | Traceback (most recent call last):
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
embedding-server  |     async with self.lifespan_context(app) as maybe_state:
embedding-server  |   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
embedding-server  |     return await anext(self.gen)
embedding-server  |   File "/app/infinity_emb/infinity_server.py", line 88, in lifespan
embedding-server  |     app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
embedding-server  |   File "/app/infinity_emb/engine.py", line 306, in from_args
embedding-server  |     return cls(engines=tuple(engines))
embedding-server  |   File "/app/infinity_emb/engine.py", line 71, in from_args
embedding-server  |     engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
embedding-server  |   File "/app/infinity_emb/engine.py", line 56, in __init__
embedding-server  |     self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
embedding-server  |   File "/app/infinity_emb/inference/select_model.py", line 83, in select_model
embedding-server  |     loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
embedding-server  |   File "/app/infinity_emb/transformer/embedder/sentence_transformer.py", line 62, in __init__
embedding-server  |     attempt_bt = check_if_bettertransformer_possible(engine_args)
embedding-server  |   File "/app/infinity_emb/transformer/acceleration.py", line 40, in check_if_bettertransformer_possible
embedding-server  |     config = AutoConfig.from_pretrained(
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
embedding-server  |     raise ValueError(
embedding-server  | ValueError: The checkpoint you are trying to load has model type `gemma3_text` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
embedding-server  | 
embedding-server  | You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
embedding-server  | 
embedding-server  | ERROR:    Application startup failed. Exiting.
WARN[0010] optional dependency "embedding-server" failed to start: container embedding-server exited (3) 
embedding-server exited with code 3

Qwen/Qwen3-Reranker-0.6B

embedding-server  | INFO:     Started server process [1]
embedding-server  | INFO:     Waiting for application startup.
embedding-server  | INFO:     Started server process [1]
embedding-server  | INFO:     Waiting for application startup.
embedding-server  | INFO     2025-09-18 15:11:31,772 infinity_emb INFO:        infinity_server.py:84
embedding-server  |          Creating 1 engines: ['Qwen/Qwen3-Reranker-0.6B']                       
embedding-server  | INFO     2025-09-18 15:11:31,774 infinity_emb INFO:              telemetry.py:34
embedding-server  |          DO_NOT_TRACK=1 registered. Anonymized usage statistics                 
embedding-server  |          are disabled.                                                          
embedding-server  | INFO     2025-09-18 15:11:31,772 infinity_emb INFO:        infinity_server.py:84
embedding-server  |          Creating 1 engines: ['Qwen/Qwen3-Reranker-0.6B']                       
embedding-server  | INFO     2025-09-18 15:11:31,774 infinity_emb INFO:              telemetry.py:34
embedding-server  |          DO_NOT_TRACK=1 registered. Anonymized usage statistics                 
embedding-server  |          are disabled.                                                          
embedding-server  | INFO     2025-09-18 15:11:31,777 infinity_emb INFO:           select_model.py:66
embedding-server  |          model=`Qwen/Qwen3-Reranker-0.6B` selected, using                       
embedding-server  |          engine=`torch` and device=`cuda`                                       
embedding-server  | INFO     2025-09-18 15:11:31,777 infinity_emb INFO:           select_model.py:66
embedding-server  |          model=`Qwen/Qwen3-Reranker-0.6B` selected, using                       
embedding-server  |          engine=`torch` and device=`cuda`                                       
embedding-server  | ERROR:    Traceback (most recent call last):
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1082, in from_pretrained
embedding-server  |     config_class = CONFIG_MAPPING[config_dict["model_type"]]
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 784, in __getitem__
embedding-server  |     raise KeyError(key)
embedding-server  | KeyError: 'qwen3'
embedding-server  | 
embedding-server  | During handling of the above exception, another exception occurred:
embedding-server  | 
embedding-server  | Traceback (most recent call last):
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
embedding-server  |     async with self.lifespan_context(app) as maybe_state:
embedding-server  |   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
embedding-server  |     return await anext(self.gen)
embedding-server  |   File "/app/infinity_emb/infinity_server.py", line 88, in lifespan
embedding-server  |     app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
embedding-server  |   File "/app/infinity_emb/engine.py", line 306, in from_args
embedding-server  |     return cls(engines=tuple(engines))
embedding-server  |   File "/app/infinity_emb/engine.py", line 71, in from_args
embedding-server  |     engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
embedding-server  |   File "/app/infinity_emb/engine.py", line 56, in __init__
embedding-server  |     self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
embedding-server  |   File "/app/infinity_emb/inference/select_model.py", line 83, in select_model
embedding-server  |     loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
embedding-server  |   File "/app/infinity_emb/transformer/embedder/sentence_transformer.py", line 62, in __init__
embedding-server  |     attempt_bt = check_if_bettertransformer_possible(engine_args)
embedding-server  |   File "/app/infinity_emb/transformer/acceleration.py", line 40, in check_if_bettertransformer_possible
embedding-server  |     config = AutoConfig.from_pretrained(
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
embedding-server  |     raise ValueError(
embedding-server  | ValueError: The checkpoint you are trying to load has model type `qwen3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
embedding-server  | 
embedding-server  | You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
embedding-server  | 
embedding-server  | ERROR:    Application startup failed. Exiting.
embedding-server  | ERROR:    Traceback (most recent call last):
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1082, in from_pretrained
embedding-server  |     config_class = CONFIG_MAPPING[config_dict["model_type"]]
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 784, in __getitem__
embedding-server  |     raise KeyError(key)
embedding-server  | KeyError: 'qwen3'
embedding-server  | 
embedding-server  | During handling of the above exception, another exception occurred:
embedding-server  | 
embedding-server  | Traceback (most recent call last):
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
embedding-server  |     async with self.lifespan_context(app) as maybe_state:
embedding-server  |   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
embedding-server  |     return await anext(self.gen)
embedding-server  |   File "/app/infinity_emb/infinity_server.py", line 88, in lifespan
embedding-server  |     app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
embedding-server  |   File "/app/infinity_emb/engine.py", line 306, in from_args
embedding-server  |     return cls(engines=tuple(engines))
embedding-server  |   File "/app/infinity_emb/engine.py", line 71, in from_args
embedding-server  |     engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
embedding-server  |   File "/app/infinity_emb/engine.py", line 56, in __init__
embedding-server  |     self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
embedding-server  |   File "/app/infinity_emb/inference/select_model.py", line 83, in select_model
embedding-server  |     loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
embedding-server  |   File "/app/infinity_emb/transformer/embedder/sentence_transformer.py", line 62, in __init__
embedding-server  |     attempt_bt = check_if_bettertransformer_possible(engine_args)
embedding-server  |   File "/app/infinity_emb/transformer/acceleration.py", line 40, in check_if_bettertransformer_possible
embedding-server  |     config = AutoConfig.from_pretrained(
embedding-server  |   File "/app/.venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1084, in from_pretrained
embedding-server  |     raise ValueError(
embedding-server  | ValueError: The checkpoint you are trying to load has model type `qwen3` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
embedding-server  | 
embedding-server  | You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`
embedding-server  | 
embedding-server  | ERROR:    Application startup failed. Exiting.
WARN[0011] optional dependency "embedding-server" failed to start: container embedding-server exited (3) 
embedding-server exited with code 3

Open source status & huggingface transformers.

  • The model implementation is available on transformers
  • The model weights are available on huggingface-hub
  • I verified that the model is currently not running in the latest version pip install infinity_emb[all] --upgrade
  • I made the authors of the model aware that I want to use it with infinity_emb & check if they are aware of the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions