Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Nexa AI support #1380

Open
iwr-redmond opened this issue Dec 11, 2024 · 1 comment
Open

[Feature Request] Nexa AI support #1380

iwr-redmond opened this issue Dec 11, 2024 · 1 comment

Comments

@iwr-redmond
Copy link

iwr-redmond commented Dec 11, 2024

This feature request would if implemented resolve #258. Since the discussion in that thread, Ollama has added support for importing Huggingface models, and llama-cpp-python (mentioned in comment 1) has had more problems with its binary package distribution than you can shake a stick at. I reckon that using Ollama as the main vehicle for resolving #258 is asking for trouble, because it uses a non-standard folder structure independent of the GGUF standard and requires manual installation (and updating on Linux).

Recently the Nexa SDK launched, providing a more enterprise focused local inference package. Notably for EDSL purposes, the entire SDK can be installed using standard Python tools and managed using Python. Integrating Nexa would allow for GGUF text-generation and vision support now, with the possibility of supporting agent-directed image generation in the future (c.f., e.g., Weng, Huang, et al, 2024; Thompson & Lindo, 2024).

Proposed implementation:

  1. Make nexaai installable with EDSL with five optional dependencies ([local-cuda], [local-rocm], etc,) representing the five binary extra-index urls to allow for easy installation with the main package. (Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)

  2. Define the NEXA_CACHE_ROOT environment variable to create an EDSL-specific GGUF cache. (Optionally, you could restrict automated EDSL-supported model preinstallation to holders of a valid EXPECTED_PARROT_API_KEY, and provide basic certification/testing of certain models.)

  3. Check for and preload a starter local model, with code similar to:

from edsl.enums import SupportedLocalModels, LocalModelConfig # fictional right now
from nexa.general import pull_model

local_path = Path(os.getenv("EDSL_LOCAL_MODELS") or LocalModelConfig["ModelStore"].value
starter_model = SupportedLocalModels() # fictional
# returns, e.g. "llama3.2" # per nexa/constants.py
hf = None # unless pulling the starter_model from huggingface
ms = None # unless pulling the starter_model from modelscope

pull_model(starter_model, hf, ms, local_download_path=local_path)
  1. Boot up the server instance with something like (bearing in mind my code is good for government work):
from edsl.enums import LocalModelConfig # fictional right now
from nexa.constants import ModelType
from nexa.gguf.server.nexa_service import run_nexa_ai_service as ManagedNexaServer

local_path = Path(os.getenv("EDSL_LOCAL_MODELS") or LocalModelConfig["ModelStore"].value
local_port = LocalModelConfig["ServerPort"].value # set a custom port to reduce the risk of conflicts with other processes, e.g. pre-existing nexa sdk
run_type = ModelType["NLP"].value

ManagedNexaServer(
        model_path_arg=model_path,
        is_local_path_arg=True,
        model_type_arg=run_type,
        huggingface=False,
        modelscope=False,
        projector_local_path_arg=None,
        port=local_port,
    )
  1. Create another inference class that mostly reuses the Ollama code, with the localhost port number retrieved from edsl.enums as above.

Refs:
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/gguf/server/nexa_service.py#L500
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/cli/entry.py#L151
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/constants.py#L6
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/cli/entry.py#L632

@iwr-redmond
Copy link
Author

iwr-redmond commented Dec 13, 2024

An update to step 1, as my earlier assertion may be incorrect:

(Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)

According to this answer on Stackoverflow, setup.py can be used to provision variables for pyproject.toml. That means one could have basic detection code for the various backends:

# setup.py
import os
import shutil
import sys
from setuptools import setup

# setup for the indexes
NEXA_INDEX_CUDA = "https://github.nexa.ai/whl/cu124"
NEXA_INDEX_METAL = "https://github.nexa.ai/whl/metal"
NEXA_INDEX_ROCM = "https://github.nexa.ai/whl/rocm621"
NEXA_INDEX_VULKAN = "https://github.nexa.ai/whl/vulkan"

# get a user-defined index (good for troubleshooting/future-proofing)
custom_wheels = os.environ.get("EDSL_EXTRA_WHEELS")

# probe the environment
host_platform = sys.platform
nvidia_driver = shutil.which("nvidia-smi")
amd_driver = shutil.which("rocminfo")

# return an edsl_extra_wheels variable for pyproject.toml
setup(
    if custom_wheels:
        edsl_extra_wheels = custom_wheels
    elif host_platform == "darwin":
        edsl_extra_wheels = NEXA_INDEX_METAL
    elif nvidia_driver:
        edsl_extra_wheels = NEXA_INDEX_CUDA
    elif amd_driver and host_platform == "linux":
        edsl_extra_wheels = NEXA_INDEX_ROCM
    else:
        edsl_extra_wheels = NEXA_INDEX_VULKAN
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant