Closed
Description
for those not familiar with C like me.
it would be great if a new endpoint added to server.cpp to make batch inference.
for example:
endpoint: /completions
post: {"prompts":["promptA","promptB","promptC"]}
response:{"results":["sequenceA","sequenceB","sequenceC"]}
it is easy to do so with Hugging Face Transformers (as i do right now), but it's quite inefficient,hope to use llama.cpp to increase the efficiency oneday, cause I am not familiar with C, so can not use baby llama. I can only use javascript to Interact data with server.cpp。
Activity