Add support LoRA layer keys from adapter config#424
Conversation
| func loraLinearLayers() -> LoRALinearLayers | ||
| /// Typically, this includes all transformer layers. | ||
| /// Must be defined explicitly since we can't unify it across all models. | ||
| var loraLayers: [Module] { get } |
There was a problem hiding this comment.
Overall I like what you built! I wonder if we could make this a non-breaking change?
For example, the old function returned [(Module, [String])] which is pretty close to [Module] + [String]. We could have default implementations of these new properties use that and split it up. I am trying to think how that would go -- do we need a new protocol for the new methods and a conditional implementation for the old protocol? Deprecate the old method/protocol. Make extension LanguageModel where Self: LoRAModel conform to the new protocol (not even sure that is possible).
Anyway, the change for each model is trivial but it would be nice to make it non-breaking if we could.
There was a problem hiding this comment.
Thanks for your feedback! After thinking about it, I don’t see a way to implement this without introducing breaking changes. Could you clarify which part you’d like to keep from breaking? I’ll explain what currently breaks and why it can’t work without changing the core logic.
1. Adapters Compatibility
When keys are not specified the Python package behaves differently before and after 0.28.3:
| mlx-lm | Default LoRA layers |
|---|---|
| <0.28.3 | self_attn.q/v_proj |
| ≥0.28.3 | self_attn.q/v/k/o_proj, mlp.up/gate/down_proj |
The current Swift implementation mirrors the older Python behavior:
public func loraLinearLayers() -> LoRALinearLayers {
model.layers.map { ($0.attention, ["q_proj", "v_proj"]) }
}If we keep the Swift code as is, it won’t work with adapters trained using the latest mlx-lm Python package. Even though both versions treat keys as null, their behavior differs. Since the Python version already introduced a breaking change, the Swift version has to do the same to stay aligned. Unfortunately, that means breaking compatibility with adapters trained using mlx-lm versions earlier than 0.28.3.
2. Swift Code Compatibility
The old code returns an array of (Module, [String]) pairs, which describes a module and its child keys. This works with self_attn and we can even somehow extend it to support all self_attn.q/v/k/o_proj layers. But if we want to adapt the mlp part, this structure becomes a problem.
public func loraLinearLayers() -> LoRALinearLayers {
model.layers.map { ($0.attention, ["q_proj", "k_proj", "v_proj", "o_proj"]) } // How to include `mlp`?
}We could try to aggregate it like in the issue workaround:
public func loraLinearLayers() -> LoRALinearLayers {
let attention = model.layers.map { ($0.attention, ["q_proj", "k_proj", "v_proj", "o_proj"]) }
let mlp = model.layers.map { ($0.mlp, ["up_proj", "gate_proj", "down_proj"]) }
return attention + mlp // This looks fine but it won't work
}But this approach is actually buggy because when we take the last num_layers specified in the adapter config, we only get part of the required modules:
let layersToAdapt = model // Assume we have a model with 36 transformer blocks
.loraLinearLayers() // 72 items total (36 attention + 36 mlp)
.suffix(config.numLayers) // Suffixing 36 items only gives a subset of the modulesSo while [(Module, [String])] looks similar to [Module] + [String] they actually differ. In the updated code, we need the LoRAModel protocol only to unify layers property access across different models, since it varies between model.layers, transformer.layers, languageModel.model.layers, etc.
We could provide default implementations that split this up.
The old implementation can’t be split into the new structure because it’s conceptually incompatible, and for the same reason, we can’t provide default values based on the old code. Old code returns very specific parts instead of complete transformer blocks. Given these issues, I don’t see a way to achieve correct behavior without updating each model implementation.
There was a problem hiding this comment.
OK, I buy that, but I am still worried about breakage. We can certainly ship it as-is and just document the changes, but ideally people could upgrade into it as needed.
What if we left the existing protocol and made a new protocol and made both work? The downside here is that it might be confusing which one to use. I guess we could deprecate the old one to suggest to avoid it.
Another problem might be that the structure of the LoRA layers isn't full encapsulated, this is in the calling code:
LoRATrain.convert(model: context.model, layers: lora.loraLinearLayers(loraLayers))There was a problem hiding this comment.
and with your change:
// Load LoRA adapter from directory or create a new one
let modelAdapter: ModelAdapter
do {
modelAdapter = try LoRAContainer.from(directory: adapter)
} catch {
modelAdapter = try await modelContainer.perform { context in
return try LoRAContainer.from(
model: context.model, configuration: LoRAConfiguration(numLayers: loraLayers))it is entirely inside the LoRAContainer.from method -- if we update this in the future it will be contained in there.
There was a problem hiding this comment.
So I think I am in agreement with what you have done -- we will just document the breaking change and how to adopt it.
|
FastVLM has been merged (#423) since this started -- can you rebase with |
davidkoski
left a comment
There was a problem hiding this comment.
Changes look great -- just waiting on FastVLM (has been added since this started) to be added or build will break on main.
74315b7 to
4c548ec
Compare
|
@davidkoski done 👍 |
davidkoski
left a comment
There was a problem hiding this comment.
Thank you for this huge effort!
* Add LoRA layer keys from config
Note: this is a source breaking change to adapt to a change on the mlx-lm side. Previously you would write:
```swift
public func loraLinearLayers() -> LoRALinearLayers {
model.layers.map { ($0.selfAttn, ["q_proj", "v_proj"]) }
}
```
now this should read:
```swift
public var loraLayers: [Module] {
model.layers
}
```
What
Added support for the
keysproperty in the adapter config. Previously, these keys were ignored, and hardcoded values were used instead. With this change,LoRAConfigurationnow correctly applies adapters based on the config-defined keys, and hardcoded keys have been removed from model implementations.By default, models now target all
Linearlayers when keys are not specified. This aligns with recent changes in the Python package ml-explore/mlx-lm#515. For example, running the example LoRA training code (without specifying keys) using the latest Python package will create LoRA adapters for allself_attn.*_projandmlp.*_projlayers. In older versions, onlyself_attn.q_projandself_attn.v_projwere targeted.Resolves
#405