-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Remote Inference Adapter for Groq #432
Comments
@ashwinb @yanxi0830 keen to work on this if nobody else is |
PR is ready for chat completions: #609 |
# What does this PR do? Contributes towards issue (#432) - Groq text chat completions - Streaming - All the sampling params that Groq supports A lot of inspiration taken from @mattf's good work at #355 **What this PR does not do** - Tool calls (Future PR) - Adding llama-guard model - See if we can add embeddings ### PR Train - #609 👈 - #630 ## Test Plan <details> <summary>Environment</summary> ```bash export GROQ_API_KEY=<api_key> wget https://raw.githubusercontent.com/aidando73/llama-stack/240e6e2a9c20450ffdcfbabd800a6c0291f19288/build.yaml wget https://raw.githubusercontent.com/aidando73/llama-stack/92c9b5297f9eda6a6e901e1adbd894e169dbb278/run.yaml # Build and run environment pip install -e . \ && llama stack build --config ./build.yaml --image-type conda \ && llama stack run ./run.yaml \ --port 5001 ``` </details> <details> <summary>Manual tests</summary> Using this jupyter notebook to test manually: https://github.com/aidando73/llama-stack/blob/2140976d76ee7ef46025c862b26ee87585381d2a/hello.ipynb Use this code to test passing in the api key from provider_data ``` from llama_stack_client import LlamaStackClient client = LlamaStackClient( base_url="http://localhost:5001", ) response = client.inference.chat_completion( model_id="Llama3.2-3B-Instruct", messages=[ {"role": "user", "content": "Hello, world client!"}, ], # Test passing in groq_api_key from the client # Need to comment out the groq_api_key in the run.yaml file x_llama_stack_provider_data='{"groq_api_key": "<api-key>"}', # stream=True, ) response ``` </details> <details> <summary>Integration</summary> `pytest llama_stack/providers/tests/inference/test_text_inference.py -v -k groq` (run in same environment) ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-groq] PASSED [ 6%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_3b-groq] SKIPPED (Other inf...) [ 12%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_3b-groq] SKIPPED [ 18%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-groq] PASSED [ 25%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_3b-groq] SKIPPED (Ot...) [ 31%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-groq] PASSED [ 37%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_3b-groq] SKIPPED [ 43%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_3b-groq] SKIPPED [ 50%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_8b-groq] PASSED [ 56%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_8b-groq] SKIPPED (Other inf...) [ 62%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_8b-groq] SKIPPED [ 68%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_8b-groq] PASSED [ 75%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_8b-groq] SKIPPED (Ot...) [ 81%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_8b-groq] PASSED [ 87%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_8b-groq] SKIPPED [ 93%] llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_8b-groq] SKIPPED [100%] ======================================= 6 passed, 10 skipped, 160 deselected, 7 warnings in 2.05s ======================================== ``` </details> <details> <summary>Unit tests</summary> `pytest llama_stack/providers/tests/inference/groq/ -v` ``` llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_sets_model PASSED [ 5%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_user_message PASSED [ 10%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_system_message PASSED [ 15%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_completion_message PASSED [ 20%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_logprobs PASSED [ 25%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_response_format PASSED [ 30%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_repetition_penalty PASSED [ 35%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_stream PASSED [ 40%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_n_is_1 PASSED [ 45%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_if_max_tokens_is_0_then_it_is_not_included PASSED [ 50%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_max_tokens_if_set PASSED [ 55%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_temperature PASSED [ 60%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_top_p PASSED [ 65%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_returns_response PASSED [ 70%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_stop_to_end_of_message PASSED [ 75%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_length_to_end_of_message PASSED [ 80%] llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertStreamChatCompletionResponse::test_returns_stream PASSED [ 85%] llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_raises_runtime_error_if_config_is_not_groq_config PASSED [ 90%] llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_returns_groq_adapter PASSED [ 95%] llama_stack/providers/tests/inference/groq/test_init.py::TestGroqConfig::test_api_key_defaults_to_env_var PASSED [100%] ==================================================== 20 passed, 11 warnings in 0.08s ===================================================== ``` </details> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation - [x] Wrote necessary unit or integration tests.
Chat completions was merged, but there's a typo - PR to fix: #719 |
# What does this PR do? Contributes towards: #432 RE: #609 I missed this one while refactoring. Fixes: ```python Traceback (most recent call last): File "/Users/aidand/dev/llama-stack/llama_stack/distribution/server/server.py", line 191, in endpoint return await maybe_await(value) File "/Users/aidand/dev/llama-stack/llama_stack/distribution/server/server.py", line 155, in maybe_await return await value File "/Users/aidand/dev/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper result = await method(self, *args, **kwargs) File "/Users/aidand/dev/llama-stack/llama_stack/distribution/routers/routers.py", line 156, in chat_completion return await provider.chat_completion(**params) File "/Users/aidand/dev/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper result = await method(self, *args, **kwargs) File "/Users/aidand/dev/llama-stack/llama_stack/providers/remote/inference/groq/groq.py", line 127, in chat_completion response = self._get_client().chat.completions.create(**request) File "/Users/aidand/dev/llama-stack/llama_stack/providers/remote/inference/groq/groq.py", line 143, in _get_client return Groq(api_key=self.config.api_key) AttributeError: 'GroqInferenceAdapter' object has no attribute 'config'. Did you mean: '_config'? ``` ## Test Plan Environment: ```shell export GROQ_API_KEY=<api-key> # build.yaml and run.yaml files wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/run.yaml # Create environment if not already conda create --prefix ./envs python=3.10 conda activate ./envs # Build pip install -e . && llama stack build --config ./build.yaml --image-type conda # Activate built environment conda activate llamastack-groq ``` <details> <summary>Manual</summary> ```bash llama stack run ./run.yaml --port 5001 ``` Via this Jupyter notebook: https://github.com/aidando73/llama-stack/blob/9165502582cd7cb178bc1dcf89955b45768ab6c1/hello.ipynb </details> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.
Ok that's merged now - @cheesecake100201 @sanjayk-github-dev chat completions for Groq are now available. Please try this out and lmk what you think Running the server# Clone the meta-llama/llama-stack repo
# See https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository for more instructions
git clone git@github.com:meta-llama/llama-stack
git checkout main && git pull
# Api key at https://console.groq.com/keys
export GROQ_API_KEY=<api-key>
# Create environment if not already
conda create --prefix ./envs python=3.10
conda activate ./envs
# build.yaml and run.yaml files
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/13d914d4a1f98fc07b72266cf6bdd67eb184d10f/run.yaml
# Build and run
pip install -e . \
&& llama stack build --config ./build.yaml --image-type conda \
&& llama stack run ./run.yaml \
--port 5001 Hitting the serverHere are some example requests: https://github.com/aidando73/llama-stack/blob/13d914d4a1f98fc07b72266cf6bdd67eb184d10f/hello.ipynb Next stepsI haven't built a distro for this. I'm waiting to see if there's enough demand for it before doing so. If this is something you'd like, please let us know 🙏. |
Yeah please, if you can get the distro done, could be very handy. |
Thanks @aidando73! |
Closing since this is almost done. |
# What does this PR do? Contributes to issue #432 - Adds tool calls to Groq provider - Enables tool call integration tests ### PR Train - #609 - #630 👈 ## Test Plan Environment: ```shell export GROQ_API_KEY=<api-key> # build.yaml and run.yaml files wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/run.yaml # Create environment if not already conda create --prefix ./envs python=3.10 conda activate ./envs # Build pip install -e . && llama stack build --config ./build.yaml --image-type conda # Activate built environment conda activate llamastack-groq ``` <details> <summary>Unit tests</summary> ```shell # Setup conda activate llamastack-groq pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s # Result llama_stack/providers/tests/inference/groq/test_groq_utils.py ..................... ======================================== 21 passed, 1 warning in 0.05s ======================================== ``` </details> <details> <summary>Integration tests</summary> ```shell # Run conda activate llamastack-groq pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s # Result llama_stack/providers/tests/inference/test_text_inference.py .sss.s.ss.sss.s... ========================== 8 passed, 10 skipped, 180 deselected, 7 warnings in 2.73s ========================== ``` </details> <details> <summary>Manual</summary> ```bash llama stack run ./run.yaml --port 5001 ``` Via this Jupyter notebook: https://github.com/aidando73/llama-stack/blob/9165502582cd7cb178bc1dcf89955b45768ab6c1/hello.ipynb </details> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. (no relevant documentation it seems) - [x] Wrote necessary unit or integration tests.
🚀 The feature, motivation and pitch
It would be very helpful to have out-of-the-box Inference Provider for GroqCloud.
Thanks!
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: