You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is anybody kind enough to create a simple vanilla example of how to fine tune Llama 2 using Lora adapters such that it to be later used with vLLM for inference. There is a bit of confusion of whether or not to use quantization when loading the model for fine tuning, apparently vLLM does not work with quantized models. Also which LoraConfig "target_modules" work with vLLM? Also should the adapter be merged?
A simple vanilla example would really help a lot.