Skip to content

onboard BGE-Base-EN-v1.5 and All-MiniLM-L6-v2 #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions examples/inference/text_embedding/embeddings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from openai import OpenAI

# The url is located in the .vLLM_model-variant_url file in the corresponding model directory.
client = OpenAI(base_url="http://gpu031:8081/v1", api_key="EMPTY")

model_name = "bge-base-en-v1.5"

input_texts = [
"The chef prepared a delicious meal.",
]

# test single embedding
embedding_response = client.embeddings.create(
model=model_name,
input=input_texts,
encoding_format="float",
)

print(embedding_response)
8 changes: 7 additions & 1 deletion vec_inf/cli/_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,13 @@ def launch(

models_df = utils.load_models_df()

models_df = models_df.with_columns(
pl.col("model_type").replace("Reward Modeling", "Reward_Modeling")
)
models_df = models_df.with_columns(
pl.col("model_type").replace("Text Embedding", "Text_Embedding")
)

if model_name in models_df["model_name"].to_list():
default_args = utils.load_default_args(models_df, model_name)
for arg in default_args:
Expand All @@ -148,7 +155,6 @@ def launch(
else:
model_args = models_df.columns
model_args.remove("model_name")
model_args.remove("model_type")
for arg in model_args:
if locals()[arg] is not None:
renamed_arg = arg.replace("_", "-")
Expand Down
1 change: 0 additions & 1 deletion vec_inf/cli/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,6 @@ def load_default_args(models_df: pl.DataFrame, model_name: str) -> dict:
row_data = models_df.filter(models_df["model_name"] == model_name)
default_args = row_data.to_dicts()[0]
default_args.pop("model_name", None)
default_args.pop("model_type", None)
return default_args


Expand Down
17 changes: 16 additions & 1 deletion vec_inf/launch_server.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ while [[ "$#" -gt 0 ]]; do
case $1 in
--model-family) model_family="$2"; shift ;;
--model-variant) model_variant="$2"; shift ;;
--model-type) model_type="$2"; shift ;;
--partition) partition="$2"; shift ;;
--qos) qos="$2"; shift ;;
--time) walltime="$2"; shift ;;
Expand All @@ -25,7 +26,7 @@ while [[ "$#" -gt 0 ]]; do
shift
done

required_vars=(model_family model_variant partition qos walltime num_nodes num_gpus max_model_len vocab_size data_type venv log_dir model_weights_parent_dir)
required_vars=(model_family model_variant model_type partition qos walltime num_nodes num_gpus max_model_len vocab_size data_type venv log_dir model_weights_parent_dir)

for var in "$required_vars[@]"; do
if [ -z "$!var" ]; then
Expand All @@ -36,6 +37,7 @@ done

export MODEL_FAMILY=$model_family
export MODEL_VARIANT=$model_variant
export MODEL_TYPE=$model_type
export JOB_PARTITION=$partition
export QOS=$qos
export WALLTIME=$walltime
Expand All @@ -48,6 +50,17 @@ export VENV_BASE=$venv
export LOG_DIR=$log_dir
export MODEL_WEIGHTS_PARENT_DIR=$model_weights_parent_dir

if [[ "$model_type" == "LLM" || "$model_type" == "VLM" ]]; then
export VLLM_TASK="generate"
elif [[ "$model_type" == "Reward_Modeling" ]]; then
export VLLM_TASK="reward"
elif [[ "$model_type" == "Text_Embedding" ]]; then
export VLLM_TASK="embed"
else
echo "Error: Unknown model_type: $model_type"
exit 1
fi

if [ -n "$max_num_seqs" ]; then
export VLLM_MAX_NUM_SEQS=$max_num_seqs
else
Expand Down Expand Up @@ -101,6 +114,8 @@ echo Num Nodes: $NUM_NODES
echo GPUs per Node: $NUM_GPUS
echo QOS: $QOS
echo Walltime: $WALLTIME
echo Model Type: $MODEL_TYPE
echo Task: $VLLM_TASK
echo Data Type: $VLLM_DATA_TYPE
echo Max Model Length: $VLLM_MAX_MODEL_LEN
echo Max Num Seqs: $VLLM_MAX_NUM_SEQS
Expand Down
12 changes: 12 additions & 0 deletions vec_inf/models/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,18 @@ More profiling metrics coming soon!
|:----------:|:----------:|:----------:|:----------:|
| [`e5-mistral-7b-instruct`](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | 1x a40 | - tokens/s | - tokens/s |

### [BAAI: bge](https://huggingface.co/BAAI)
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|:----------:|:----------:|:----------:|:----------:|
| [`bge-base-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5) | 1x A40 | - tokens/s | - tokens/s |

### [Sentence Transformers: MiniLM](https://huggingface.co/sentence-transformers)
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|:----------:|:----------:|:----------:|:----------:|
| [`all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | 1x A40 | - tokens/s | - tokens/s |



## Reward Modeling Models

### [Qwen: Qwen2.5-Math](https://huggingface.co/collections/Qwen/qwen25-math-66eaa240a1b7d5ee65f1da3e)
Expand Down
2 changes: 2 additions & 0 deletions vec_inf/models/models.csv
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,5 @@ Qwen2.5-Math-RM-72B,Qwen2.5,Math-RM-72B,Reward Modeling,4,1,152064,4096,256,true
QwQ-32B-Preview,QwQ,32B-Preview,LLM,2,1,152064,32768,256,true,false,m2,08:00:00,a40,auto,singularity,default,/model-weights
Pixtral-12B-2409,Pixtral,12B-2409,VLM,1,1,131072,8192,256,true,false,m2,08:00:00,a40,auto,singularity,default,/model-weights
e5-mistral-7b-instruct,e5,mistral-7b-instruct,Text Embedding,1,1,32000,4096,256,true,false,m2,08:00:00,a40,auto,singularity,default,/model-weights
bge-base-en-v1.5,bge,base-en-v1.5,Text Embedding,1,1,30522,512,256,true,false,m2,08:00:00,a40,auto,singularity,default,/model-weights
all-MiniLM-L6-v2,all-MiniLM,L6-v2,Text Embedding,1,1,30522,512,256,true,false,m2,08:00:00,a40,auto,singularity,default,/model-weights
4 changes: 3 additions & 1 deletion vec_inf/multinode_vllm.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ nvidia-smi
source ${SRC_DIR}/find_port.sh

if [ "$VENV_BASE" = "singularity" ]; then
export SINGULARITY_IMAGE=/projects/aieng/public/vector-inference_0.6.4.post1.sif
export SINGULARITY_IMAGE=/projects/aieng/public/vector-inference_latest.sif
export VLLM_NCCL_SO_PATH=/vec-inf/nccl/libnccl.so.2.18.1
module load singularity-ce/3.8.2
singularity exec $SINGULARITY_IMAGE ray stop
Expand Down Expand Up @@ -103,6 +103,7 @@ if [ "$VENV_BASE" = "singularity" ]; then
--max-logprobs ${VLLM_MAX_LOGPROBS} \
--max-model-len ${VLLM_MAX_MODEL_LEN} \
--max-num-seqs ${VLLM_MAX_NUM_SEQS} \
--task ${VLLM_TASK} \
${ENFORCE_EAGER}
else
source ${VENV_BASE}/bin/activate
Expand All @@ -118,5 +119,6 @@ else
--max-logprobs ${VLLM_MAX_LOGPROBS} \
--max-model-len ${VLLM_MAX_MODEL_LEN} \
--max-num-seqs ${VLLM_MAX_NUM_SEQS} \
--task ${VLLM_TASK} \
${ENFORCE_EAGER}
fi
4 changes: 3 additions & 1 deletion vec_inf/vllm.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ fi

# Activate vllm venv
if [ "$VENV_BASE" = "singularity" ]; then
export SINGULARITY_IMAGE=/projects/aieng/public/vector-inference_0.6.4.post1.sif
export SINGULARITY_IMAGE=/projects/aieng/public/vector-inference_latest.sif
export VLLM_NCCL_SO_PATH=/vec-inf/nccl/libnccl.so.2.18.1
module load singularity-ce/3.8.2
singularity exec $SINGULARITY_IMAGE ray stop
Expand All @@ -39,6 +39,7 @@ if [ "$VENV_BASE" = "singularity" ]; then
--trust-remote-code \
--max-model-len ${VLLM_MAX_MODEL_LEN} \
--max-num-seqs ${VLLM_MAX_NUM_SEQS} \
--task ${VLLM_TASK} \
${ENFORCE_EAGER}
else
source ${VENV_BASE}/bin/activate
Expand All @@ -53,5 +54,6 @@ else
--trust-remote-code \
--max-model-len ${VLLM_MAX_MODEL_LEN} \
--max-num-seqs ${VLLM_MAX_NUM_SEQS} \
--task ${VLLM_TASK} \
${ENFORCE_EAGER}
fi