-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Describe the issue
code :
# Copyright (c) 2024 Microsoft
# Licensed under The MIT License [see LICENSE for details]
from vllm import LLM, SamplingParams
from minference import MInference
import time
def read_content_from_file(file_path, num_chars=5000):
try:
with open(file_path, 'r', encoding='utf-8') as file:
content = file.read(num_chars)
return content
except FileNotFoundError:
logging.error(f"File {file_path} not found.")
return ""
except Exception as e:
logging.error(f"An error occurred while reading the file: {e}")
return ""
content = read_content_from_file("./question.txt", 12000) + ",请总结上面的故事梗概。"
prompts = []
for _ in range(50):
prompts.extend([content])
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=1,
)
model_name = "/xxx/model/Qwen2-7B-Instruct"
llm = LLM(
model_name,
max_num_seqs=1,
enforce_eager=True,
tensor_parallel_size=1,
max_model_len=128000,
)
start_time = time.time()
outputs = llm.generate(prompts, sampling_params)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"vllm Generating text took {elapsed_time:.2f} seconds.")
# Patch MInference Module
minference_patch = MInference("vllm", model_name)
llm = minference_patch(llm)
start_time = time.time()
outputs = llm.generate(prompts, sampling_params)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"minference Generating text took {elapsed_time:.2f} seconds.")
results of execution:
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:41<00:00, 1.22it/s]
vllm Generating text took 41.57 seconds.
Patched model for minference with vLLM..
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [01:34<00:00, 1.90s/it]
minference Generating text took 95.37 seconds.
why minference slower than vllm
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested