Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Nutanix AI Endpoint #346

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Add Nutanix AI Endpoint #346

wants to merge 6 commits into from

Conversation

jinan-zhou
Copy link

@jinan-zhou jinan-zhou commented Oct 29, 2024

Add Nutanix AI Endpoint

This PR adds Nutanix AI Endpoint as a provider.
The distribution container is at https://hub.docker.com/repository/docker/jinanz/distribution-nutanix/general.

Setup instructions

Please refer to llama_stack/templates/nutanix/doc_template.md for details

Feature/Issue validation/testing/test plan

  • Test non-streaming inference

Query

curl -X POST http://localhost:1740/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"How far is the sun? Answer in one sentence.", "role": "user"}],"stream":false}'

Response

{"completion_message":{"role":"assistant","content":"The average distance from the Earth to the sun is approximately 93 million miles (149.6 million kilometers), which is about 8 minutes and 20 seconds away from Earth at the speed of light.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}
  • Test streaming inference

Query

curl -X POST http://localhost:1740/inference/chat_completion -H "Content-Type: application/json" -d '{"model":"Llama3.1-8B-Instruct","messages":[{"content":"How far is the moon? Answer in one sentence.", "role": "user"}],"stream":true}'

Response

data: {"event":{"event_type":"start","delta":"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":"The","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":" average","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":" distance","logprobs":null,"stop_reason":null}}

...

data: {"event":{"event_type":"progress","delta":" distance","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"progress","delta":".\"","logprobs":null,"stop_reason":null}}

data: {"event":{"event_type":"complete","delta":"","logprobs":null,"stop_reason":"end_of_turn"}}

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 29, 2024
@jinan-zhou
Copy link
Author

Hi Llama Stack team, your reviews are much appreciated!
@ashwinb @yanxi0830 @hardikjshah @dltn @raghotham

@jinan-zhou jinan-zhou force-pushed the nai branch 3 times, most recently from ad23f19 to 8878396 Compare November 22, 2024 00:22
]


class NutanixInferenceAdapter(ModelRegistryHelper, Inference):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashwinb this is almost the same code as fireworks and databricks. what do you think of having a common base class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattf Yes, I think we need to start consolidating more on the code side.

we have some tests now but we also need to put down some more requirements of when a new inference provider comes in. here are some things we are thinking about:

  • support for structured decoding -- kind of table stakes now
  • proper support for tool calling (either directly or via allow legacy completions API so llama stack can format the prompt)
  • support for vision models

otherwise we cannot claim to the user that "you can just Llama Stack and pick-and-choose any provider and you will get a consistent experience"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback. While the code does share similarities with Fireworks and Databricks, there are important differences, and we anticipate adding new features that will further differentiate our implementation from those of other vendors.

I believe it may be more efficient for each vendor to maintain their own Llama Stack adapter. The duplication of code within each adapter, in this context, is manageable and can even be beneficial. Adopting a "Do Repeat Yourself" approach for these adapters aligns with maintaining clarity and flexibility, especially given the unique requirements and evolution of individual providers.

That said, I’m open to further discussions if there’s a strong case for a shared base class or alternative approach. Let me know your thoughts!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinan-zhou thank you for the thoughtful argument. i think you're right that abstracting the providers now is too early. i raise the topic only to start a discuss, not to block or slow your valuable contribution.

@jinan-zhou jinan-zhou requested a review from mattf December 3, 2024 20:37
Copy link
Contributor

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jinan-zhou
Copy link
Author

jinan-zhou commented Dec 4, 2024

@mattf Thank you so much!
Could you approve the PR?
Also @ashwinb @raghotham

@mattf
Copy link
Contributor

mattf commented Dec 5, 2024

@mattf Thank you so much! Could you approve the PR? Also @ashwinb @raghotham

i'm not authorized. @ashwinb or @raghotham certainly can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants