Skip to content

llama : add batched inference endpoint to server #3478

Closed
@niubi-AI

Description

for those not familiar with C like me.
it would be great if a new endpoint added to server.cpp to make batch inference.
for example:
endpoint: /completions
post: {"prompts":["promptA","promptB","promptC"]}
response:{"results":["sequenceA","sequenceB","sequenceC"]}

it is easy to do so with Hugging Face Transformers (as i do right now), but it's quite inefficient,hope to use llama.cpp to increase the efficiency oneday, cause I am not familiar with C, so can not use baby llama. I can only use javascript to Interact data with server.cpp。

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions