Skip to content

Standalone Server #21

Closed
Closed
@abetlen

Description

@abetlen

Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g. llama-cpp-python[server] or something like that.

Work that needs to be done first:

  • Ensure compatibility with OpenAI
    • Response objects match
    • Request objects match
    • Loaded model appears under /v1/models endpoint
    • Test OpenAI client libraries
    • Unsupported parameters should be silently ignored
  • Ease-of-use
    • Integrate server as a subpackage
    • CLI tool to run the server

Future work

  • Prompt caching to improve latency
  • Support multiple models in the same server
  • Add tokenization endpoints to make it easier to make it easier for small clients to calculate context window sizes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions