You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If your model imports modules that initialize CUDA, consider lazy-importing it to avoid errors like `RuntimeError: Cannot re-initialize CUDA in forked subprocess`:
40
40
41
41
```python
42
-
from vllm import ModelRegistry
43
-
44
-
ModelRegistry.register_model(
45
-
"YourModelForCausalLM",
46
-
"your_code:YourModelForCausalLM"
47
-
)
42
+
# The entrypoint of your plugin
43
+
defregister():
44
+
from vllm import ModelRegistry
45
+
46
+
ModelRegistry.register_model(
47
+
"YourModelForCausalLM",
48
+
"your_code:YourModelForCausalLM"
49
+
)
48
50
```
49
51
50
52
!!! warning
51
53
If your model is a multimodal model, ensure the model class implements the [SupportsMultiModal][vllm.model_executor.models.interfaces.SupportsMultiModal] interface.
52
54
Read more about that [here][supports-multimodal].
53
-
54
-
!!! note
55
-
Although you can directly put these code snippets in your script using `vllm.LLM`, the recommended way is to place these snippets in a vLLM plugin. This ensures compatibility with various vLLM features like distributed inference and the API server.
For more information on adding entry points to your package, please check the [official documentation](https://setuptools.pypa.io/en/latest/userguide/entry_point.html).
0 commit comments