Skip to content

[Feature] Customized mapping for LoRA weight names #6608

@lifuhuang

Description

@lifuhuang

Checklist

Motivation

The current LoRA impl in SGL maps LoRA weight to modules by (layer index, op_type) tuple, where op_type operation looks like qkv_proj, o_proj, gate_up, etc. This works fine for most standard cases, however, there are some limitations:

  1. For models where there are more than one attention stacks (e.g., VLM), there could be multiple modules with the same (layer index, op_type), e.g., one from vision tower, the other from the language model. Currently SGL cannot handle such cases correctly and would usually fail during loading due to incorrect mapping.
  2. Users cannot enable/disable application of LoRA at module-level, e.g., if user only wants to apply LoRA at language model but not vision (common); or when user only wants to apply LoRA at some layers but not the others (less common?), they cannot do that today.
  3. (Less common?) Models with non-standard LoRA weight / module names.

Proposal:

  • (Short-term) add an optional hook should_apply_lora at model level to allow model to customize LoRA application at model level. This would unblock most cases in 1 & 2. For example, for most VLMs, LoRAs should only be applied to language model but not vision tower. In these cases, model authors could simply disable LoRA application for modules in the vision tower, This would address the current LoRA loading failures due to incorrect mapping.
  • (Long-term) generalize the hook to map_lora_module_name to allow model owner to map a given module to a specific LoRA weight name or return None when LoRA should not be applied. This would address 3 and some less common cases in 1 (e.g., when LoRA needs to be applied at both vision tower and language model)

(cc @Fridge003 )

Related resources

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions