Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion api/inference/v1alpha1/backendruntime_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ type BackendRuntimeArg struct {
// BackendRuntimeSpec defines the desired state of BackendRuntime
type BackendRuntimeSpec struct {
// Commands represents the default command of the backendRuntime.
Commands []string `json:"commands"`
// +optional
Commands []string `json:"commands,omitempty"`
// Image represents the default image registry of the backendRuntime.
// It will work together with version to make up a real image.
Image string `json:"image"`
Expand Down
1 change: 0 additions & 1 deletion config/crd/bases/inference.llmaz.io_backendruntimes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,6 @@ spec:
It will be appended to the image as a tag.
type: string
required:
- commands
- image
- resources
- version
Expand Down
4 changes: 2 additions & 2 deletions docs/examples/sglang/playground.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
name: qwen2-05b
name: qwen2-0--5b
spec:
replicas: 1
modelClaim:
modelName: qwen2-05b
modelName: qwen2-0--5b
backendRuntimeConfig:
name: sglang
8 changes: 4 additions & 4 deletions docs/support-backends.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# All Kinds of Supported Inference Backends

## vLLM
## llama.cpp

[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs
[llama.cpp](https://github.com/ggerganov/llama.cpp) is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

## SGLang

[SGLang](https://github.com/sgl-project/sglang) is yet another fast serving framework for large language models and vision language models.

## llama.cpp
## vLLM

[llama.cpp](https://github.com/ggerganov/llama.cpp) is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs