[Feature Request] Nexa AI support #1380

iwr-redmond · 2024-12-11T23:31:50Z

This feature request would if implemented resolve #258. Since the discussion in that thread, Ollama has added support for importing Huggingface models, and llama-cpp-python (mentioned in comment 1) has had more problems with its binary package distribution than you can shake a stick at. I reckon that using Ollama as the main vehicle for resolving #258 is asking for trouble, because it uses a non-standard folder structure independent of the GGUF standard and requires manual installation (and updating on Linux).

Recently the Nexa SDK launched, providing a more enterprise focused local inference package. Notably for EDSL purposes, the entire SDK can be installed using standard Python tools and managed using Python. Integrating Nexa would allow for GGUF text-generation and vision support now, with the possibility of supporting agent-directed image generation in the future (c.f., e.g., Weng, Huang, et al, 2024; Thompson & Lindo, 2024).

Proposed implementation:

Make nexaai installable with EDSL with five optional dependencies ([local-cuda], [local-rocm], etc,) representing the five binary extra-index urls to allow for easy installation with the main package. (Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)
Define the NEXA_CACHE_ROOT environment variable to create an EDSL-specific GGUF cache. (Optionally, you could restrict automated EDSL-supported model preinstallation to holders of a valid EXPECTED_PARROT_API_KEY, and provide basic certification/testing of certain models.)
Check for and preload a starter local model, with code similar to:

from edsl.enums import SupportedLocalModels, LocalModelConfig # fictional right now
from nexa.general import pull_model

local_path = Path(os.getenv("EDSL_LOCAL_MODELS") or LocalModelConfig["ModelStore"].value
starter_model = SupportedLocalModels() # fictional
# returns, e.g. "llama3.2" # per nexa/constants.py
hf = None # unless pulling the starter_model from huggingface
ms = None # unless pulling the starter_model from modelscope

pull_model(starter_model, hf, ms, local_download_path=local_path)

Boot up the server instance with something like (bearing in mind my code is good for government work):

from edsl.enums import LocalModelConfig # fictional right now
from nexa.constants import ModelType
from nexa.gguf.server.nexa_service import run_nexa_ai_service as ManagedNexaServer

local_path = Path(os.getenv("EDSL_LOCAL_MODELS") or LocalModelConfig["ModelStore"].value
local_port = LocalModelConfig["ServerPort"].value # set a custom port to reduce the risk of conflicts with other processes, e.g. pre-existing nexa sdk
run_type = ModelType["NLP"].value

ManagedNexaServer(
        model_path_arg=model_path,
        is_local_path_arg=True,
        model_type_arg=run_type,
        huggingface=False,
        modelscope=False,
        projector_local_path_arg=None,
        port=local_port,
    )

Create another inference class that mostly reuses the Ollama code, with the localhost port number retrieved from edsl.enums as above.

Refs:
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/gguf/server/nexa_service.py#L500
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/cli/entry.py#L151
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/constants.py#L6
https://github.com/NexaAI/nexa-sdk/blob/main/nexa/cli/entry.py#L632

iwr-redmond · 2024-12-13T17:25:18Z

An update to step 1, as my earlier assertion may be incorrect:

(Unfortunately there seems to be no easy replacement for setup.py install_scripts that could probe the install environment and calculate the correct repository.)

According to this answer on Stackoverflow, setup.py can be used to provision variables for pyproject.toml. That means one could have basic detection code for the various backends:

# setup.py
import os
import shutil
import sys
from setuptools import setup

# setup for the indexes
NEXA_INDEX_CUDA = "https://github.nexa.ai/whl/cu124"
NEXA_INDEX_METAL = "https://github.nexa.ai/whl/metal"
NEXA_INDEX_ROCM = "https://github.nexa.ai/whl/rocm621"
NEXA_INDEX_VULKAN = "https://github.nexa.ai/whl/vulkan"

# get a user-defined index (good for troubleshooting/future-proofing)
custom_wheels = os.environ.get("EDSL_EXTRA_WHEELS")

# probe the environment
host_platform = sys.platform
nvidia_driver = shutil.which("nvidia-smi")
amd_driver = shutil.which("rocminfo")

# return an edsl_extra_wheels variable for pyproject.toml
setup(
    if custom_wheels:
        edsl_extra_wheels = custom_wheels
    elif host_platform == "darwin":
        edsl_extra_wheels = NEXA_INDEX_METAL
    elif nvidia_driver:
        edsl_extra_wheels = NEXA_INDEX_CUDA
    elif amd_driver and host_platform == "linux":
        edsl_extra_wheels = NEXA_INDEX_ROCM
    else:
        edsl_extra_wheels = NEXA_INDEX_VULKAN
)

iwr-redmond mentioned this issue Jan 1, 2025

agent_dynamic_traits error #1453

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Nexa AI support #1380

[Feature Request] Nexa AI support #1380

iwr-redmond commented Dec 11, 2024 •

edited

Loading

iwr-redmond commented Dec 13, 2024 •

edited

Loading

[Feature Request] Nexa AI support #1380

[Feature Request] Nexa AI support #1380

Comments

iwr-redmond commented Dec 11, 2024 • edited Loading

iwr-redmond commented Dec 13, 2024 • edited Loading

iwr-redmond commented Dec 11, 2024 •

edited

Loading

iwr-redmond commented Dec 13, 2024 •

edited

Loading