a simple vanilla example of how to fine tune Llama 2 using Lora adapters for vLLM inference 

Is anybody kind enough to create a simple vanilla example of how to fine tune Llama 2 using Lora adapters such that it to be later used with vLLM for inference. There is a bit of confusion of whether or not to use quantization when loading the model for fine tuning, apparently vLLM does not work with quantized models. Also which LoraConfig "target_modules" work with vLLM? Also should the adapter be merged? 
A simple vanilla example would really help a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

a simple vanilla example of how to fine tune Llama 2 using Lora adapters for vLLM inference #997

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

a simple vanilla example of how to fine tune Llama 2 using Lora adapters for vLLM inference #997

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions