-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Use api #540
Comments
The serve cli module might give you some hints. with torch.inference_mode():
output_ids = model.generate(
input_ids,
images=image_tensor,
do_sample=True,
temperature=args.temperature,
max_new_tokens=args.max_new_tokens,
streamer=streamer,
use_cache=True,
stopping_criteria=[stopping_criteria])
outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
conv.messages[-1][-1] = outputs Reference: https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/cli.py |
im trying to modify the client to give the user the option to upload new images, but after upload the 3rd image i get oom error, im doing garbage colector and cleaning cache of vram but maybe im doing it wrong
|
请问解决了吗?遇到同样的困惑。 |
The reason is that the request prompt will superimpose the previous image and text prompt token. Just re-initialize "conv" parameter every time you request it. |
@vicgarfield thanks bro, working beautiful everything its cleaning now memory like it should be. Im on loop 9 and not accumulating any extra vram. Im running it with only 12gb vram. here i left code
To run the model:
|
@userbox020 Perfect! |
And tcli.py must be in ../LLaVA/llava/serve 👍 Thanks |
VERY GOOD!!! |
Hey, |
+1 for this |
Sorry bro i confuse post, it wasnt an API it just some tricks to don't oom with 8 gb gpus. But adding API will be easy enought, you can create a separate script and use the module subprocess and save the output or stream it. Then use any python http server to publish or stream the output. Chatgpt must be able to do a good code template for that |
Question
How can I use API requests to call the model for the upper level application?
The text was updated successfully, but these errors were encountered: