Skip to content

What is the correct way to use quantized versions of vicuna or guanco? #210

Closed
@armsp

Description

@armsp

I have been trying to use quantized versions of models to use my GPU whose VRAM is 6GB max. However nothing seems to work. How would I go about using 5bit versions that use under 6GB in memory?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions