-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
The current LoRA impl in SGL maps LoRA weight to modules by (layer index, op_type) tuple, where op_type operation looks like qkv_proj, o_proj, gate_up, etc. This works fine for most standard cases, however, there are some limitations:
- For models where there are more than one attention stacks (e.g., VLM), there could be multiple modules with the same (layer index, op_type), e.g., one from vision tower, the other from the language model. Currently SGL cannot handle such cases correctly and would usually fail during loading due to incorrect mapping.
- Users cannot enable/disable application of LoRA at module-level, e.g., if user only wants to apply LoRA at language model but not vision (common); or when user only wants to apply LoRA at some layers but not the others (less common?), they cannot do that today.
- (Less common?) Models with non-standard LoRA weight / module names.
Proposal:
- (Short-term) add an optional hook
should_apply_loraat model level to allow model to customize LoRA application at model level. This would unblock most cases in 1 & 2. For example, for most VLMs, LoRAs should only be applied to language model but not vision tower. In these cases, model authors could simply disable LoRA application for modules in the vision tower, This would address the current LoRA loading failures due to incorrect mapping. - (Long-term) generalize the hook to
map_lora_module_nameto allow model owner to map a given module to a specific LoRA weight name or return None when LoRA should not be applied. This would address 3 and some less common cases in 1 (e.g., when LoRA needs to be applied at both vision tower and language model)
(cc @Fridge003 )
Related resources
No response
Fridge003, ispobock, KMouratidis, ConnorLi96 and pbarker-synth