You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature request would if implemented resolve #258. Since the discussion in that thread, Ollama has added support for importing Huggingface models, and llama-cpp-python (mentioned in comment 1) has had more problems with its binary package distribution than you can shake a stick at. I reckon that using Ollama as the main vehicle for resolving #258 is asking for trouble, because it uses a non-standard folder structure independent of the GGUF standard and requires manual installation (and updating on Linux).
Recently the Nexa SDK launched, providing a more enterprise focused local inference package. Notably for EDSL purposes, the entire SDK can be installed using standard Python tools and managed using Python. Integrating Nexa would allow for GGUF text-generation and vision support now, with the possibility of supporting agent-directed image generation in the future (c.f., e.g., Weng, Huang, et al, 2024; Thompson & Lindo, 2024).
Proposed implementation:
Make nexaai installable with EDSL with five optional dependencies ([local-cuda], [local-rocm], etc,) representing the five binary extra-index urls to allow for easy installation with the main package. (Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)
Define the NEXA_CACHE_ROOT environment variable to create an EDSL-specific GGUF cache. (Optionally, you could restrict automated EDSL-supported model preinstallation to holders of a valid EXPECTED_PARROT_API_KEY, and provide basic certification/testing of certain models.)
Check for and preload a starter local model, with code similar to:
fromedsl.enumsimportSupportedLocalModels, LocalModelConfig# fictional right nowfromnexa.generalimportpull_modellocal_path=Path(os.getenv("EDSL_LOCAL_MODELS") orLocalModelConfig["ModelStore"].valuestarter_model=SupportedLocalModels() # fictional# returns, e.g. "llama3.2" # per nexa/constants.pyhf=None# unless pulling the starter_model from huggingfacems=None# unless pulling the starter_model from modelscopepull_model(starter_model, hf, ms, local_download_path=local_path)
Boot up the server instance with something like (bearing in mind my code is good for government work):
fromedsl.enumsimportLocalModelConfig# fictional right nowfromnexa.constantsimportModelTypefromnexa.gguf.server.nexa_serviceimportrun_nexa_ai_serviceasManagedNexaServerlocal_path=Path(os.getenv("EDSL_LOCAL_MODELS") orLocalModelConfig["ModelStore"].valuelocal_port=LocalModelConfig["ServerPort"].value# set a custom port to reduce the risk of conflicts with other processes, e.g. pre-existing nexa sdkrun_type=ModelType["NLP"].valueManagedNexaServer(
model_path_arg=model_path,
is_local_path_arg=True,
model_type_arg=run_type,
huggingface=False,
modelscope=False,
projector_local_path_arg=None,
port=local_port,
)
Create another inference class that mostly reuses the Ollama code, with the localhost port number retrieved from edsl.enums as above.
An update to step 1, as my earlier assertion may be incorrect:
(Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)
According to this answer on Stackoverflow, setup.py can be used to provision variables for pyproject.toml. That means one could have basic detection code for the various backends:
# setup.pyimportosimportshutilimportsysfromsetuptoolsimportsetup# setup for the indexesNEXA_INDEX_CUDA="https://github.nexa.ai/whl/cu124"NEXA_INDEX_METAL="https://github.nexa.ai/whl/metal"NEXA_INDEX_ROCM="https://github.nexa.ai/whl/rocm621"NEXA_INDEX_VULKAN="https://github.nexa.ai/whl/vulkan"# get a user-defined index (good for troubleshooting/future-proofing)custom_wheels=os.environ.get("EDSL_EXTRA_WHEELS")
# probe the environmenthost_platform=sys.platformnvidia_driver=shutil.which("nvidia-smi")
amd_driver=shutil.which("rocminfo")
# return an edsl_extra_wheels variable for pyproject.tomlsetup(
ifcustom_wheels:
edsl_extra_wheels=custom_wheelselifhost_platform=="darwin":
edsl_extra_wheels=NEXA_INDEX_METALelifnvidia_driver:
edsl_extra_wheels=NEXA_INDEX_CUDAelifamd_driverandhost_platform=="linux":
edsl_extra_wheels=NEXA_INDEX_ROCMelse:
edsl_extra_wheels=NEXA_INDEX_VULKAN
)
This feature request would if implemented resolve #258. Since the discussion in that thread, Ollama has added support for importing Huggingface models, and llama-cpp-python (mentioned in comment 1) has had more problems with its binary package distribution than you can shake a stick at. I reckon that using Ollama as the main vehicle for resolving #258 is asking for trouble, because it uses a non-standard folder structure independent of the GGUF standard and requires manual installation (and updating on Linux).
Recently the Nexa SDK launched, providing a more enterprise focused local inference package. Notably for EDSL purposes, the entire SDK can be installed using standard Python tools and managed using Python. Integrating Nexa would allow for GGUF text-generation and vision support now, with the possibility of supporting agent-directed image generation in the future (c.f., e.g., Weng, Huang, et al, 2024; Thompson & Lindo, 2024).
Proposed implementation:
Make nexaai installable with EDSL with five optional dependencies ([local-cuda], [local-rocm], etc,) representing the five binary extra-index urls to allow for easy installation with the main package. (Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)
Define the NEXA_CACHE_ROOT environment variable to create an EDSL-specific GGUF cache. (Optionally, you could restrict automated EDSL-supported model preinstallation to holders of a valid EXPECTED_PARROT_API_KEY, and provide basic certification/testing of certain models.)
Check for and preload a starter local model, with code similar to:
Refs:
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/gguf/server/nexa_service.py#L500
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/cli/entry.py#L151
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/constants.py#L6
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/cli/entry.py#L632
The text was updated successfully, but these errors were encountered: