Skip to content

Add batched inference #771

Open
Open
@abetlen

Description

@abetlen
  • Use llama_decode instead of deprecated llama_eval in Llama class
  • Implement batched inference support for generate and create_completion methods in Llama class
  • Add support for streaming / infinite completion

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions