Skip to content

Using web-hosted model for inference #44

Open

Description

Currently the NousResearch/Llama-2-7b-chat-hf model appears to be running locally on my machine, which can take quite a while for long prompts. I'd like to use more AI-optimized hardware to speed this process up.

Is it possible to use a web-hosted version of the model, or use a different web-hosted model entirely?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions