Skip to content

Add support LoRA layer keys from adapter config#424

Merged
davidkoski merged 5 commits into
ml-explore:mainfrom
petrukha-ivan:lora-layer-keys-config
Oct 29, 2025
Merged

Add support LoRA layer keys from adapter config#424
davidkoski merged 5 commits into
ml-explore:mainfrom
petrukha-ivan:lora-layer-keys-config

Conversation

@petrukha-ivan
Copy link
Copy Markdown
Contributor

What

Added support for the keys property in the adapter config. Previously, these keys were ignored, and hardcoded values were used instead. With this change, LoRAConfiguration now correctly applies adapters based on the config-defined keys, and hardcoded keys have been removed from model implementations.

By default, models now target all Linear layers when keys are not specified. This aligns with recent changes in the Python package ml-explore/mlx-lm#515. For example, running the example LoRA training code (without specifying keys) using the latest Python package will create LoRA adapters for all self_attn.*_proj and mlp.*_proj layers. In older versions, only self_attn.q_proj and self_attn.v_proj were targeted.

Resolves

#405

func loraLinearLayers() -> LoRALinearLayers
/// Typically, this includes all transformer layers.
/// Must be defined explicitly since we can't unify it across all models.
var loraLayers: [Module] { get }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like what you built! I wonder if we could make this a non-breaking change?

For example, the old function returned [(Module, [String])] which is pretty close to [Module] + [String]. We could have default implementations of these new properties use that and split it up. I am trying to think how that would go -- do we need a new protocol for the new methods and a conditional implementation for the old protocol? Deprecate the old method/protocol. Make extension LanguageModel where Self: LoRAModel conform to the new protocol (not even sure that is possible).

Anyway, the change for each model is trivial but it would be nice to make it non-breaking if we could.

Copy link
Copy Markdown
Contributor Author

@petrukha-ivan petrukha-ivan Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback! After thinking about it, I don’t see a way to implement this without introducing breaking changes. Could you clarify which part you’d like to keep from breaking? I’ll explain what currently breaks and why it can’t work without changing the core logic.

1. Adapters Compatibility

When keys are not specified the Python package behaves differently before and after 0.28.3:

mlx-lm Default LoRA layers
<0.28.3 self_attn.q/v_proj
≥0.28.3 self_attn.q/v/k/o_proj, mlp.up/gate/down_proj

The current Swift implementation mirrors the older Python behavior:

public func loraLinearLayers() -> LoRALinearLayers {
    model.layers.map { ($0.attention, ["q_proj", "v_proj"]) }
}

If we keep the Swift code as is, it won’t work with adapters trained using the latest mlx-lm Python package. Even though both versions treat keys as null, their behavior differs. Since the Python version already introduced a breaking change, the Swift version has to do the same to stay aligned. Unfortunately, that means breaking compatibility with adapters trained using mlx-lm versions earlier than 0.28.3.

2. Swift Code Compatibility

The old code returns an array of (Module, [String]) pairs, which describes a module and its child keys. This works with self_attn and we can even somehow extend it to support all self_attn.q/v/k/o_proj layers. But if we want to adapt the mlp part, this structure becomes a problem.

public func loraLinearLayers() -> LoRALinearLayers {
    model.layers.map { ($0.attention, ["q_proj", "k_proj", "v_proj", "o_proj"]) } // How to include `mlp`?
}

We could try to aggregate it like in the issue workaround:

public func loraLinearLayers() -> LoRALinearLayers {
    let attention = model.layers.map { ($0.attention, ["q_proj", "k_proj", "v_proj", "o_proj"]) }
    let mlp = model.layers.map { ($0.mlp, ["up_proj", "gate_proj", "down_proj"]) }
    return attention + mlp // This looks fine but it won't work
}

But this approach is actually buggy because when we take the last num_layers specified in the adapter config, we only get part of the required modules:

let layersToAdapt = model // Assume we have a model with 36 transformer blocks
    .loraLinearLayers() // 72 items total (36 attention + 36 mlp)
    .suffix(config.numLayers) // Suffixing 36 items only gives a subset of the modules

So while [(Module, [String])] looks similar to [Module] + [String] they actually differ. In the updated code, we need the LoRAModel protocol only to unify layers property access across different models, since it varies between model.layers, transformer.layers, languageModel.model.layers, etc.

We could provide default implementations that split this up.

The old implementation can’t be split into the new structure because it’s conceptually incompatible, and for the same reason, we can’t provide default values based on the old code. Old code returns very specific parts instead of complete transformer blocks. Given these issues, I don’t see a way to achieve correct behavior without updating each model implementation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I buy that, but I am still worried about breakage. We can certainly ship it as-is and just document the changes, but ideally people could upgrade into it as needed.

What if we left the existing protocol and made a new protocol and made both work? The downside here is that it might be confusing which one to use. I guess we could deprecate the old one to suggest to avoid it.

Another problem might be that the structure of the LoRA layers isn't full encapsulated, this is in the calling code:

            LoRATrain.convert(model: context.model, layers: lora.loraLinearLayers(loraLayers))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and with your change:

        // Load LoRA adapter from directory or create a new one
        let modelAdapter: ModelAdapter
        do {
            modelAdapter = try LoRAContainer.from(directory: adapter)
        } catch {
            modelAdapter = try await modelContainer.perform { context in
                return try LoRAContainer.from(
                    model: context.model, configuration: LoRAConfiguration(numLayers: loraLayers))

it is entirely inside the LoRAContainer.from method -- if we update this in the future it will be contained in there.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think I am in agreement with what you have done -- we will just document the breaking change and how to adopt it.

@davidkoski
Copy link
Copy Markdown
Collaborator

FastVLM has been merged (#423) since this started -- can you rebase with main and convert that one? Thanks!

Copy link
Copy Markdown
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great -- just waiting on FastVLM (has been added since this started) to be added or build will break on main.

@petrukha-ivan petrukha-ivan force-pushed the lora-layer-keys-config branch from 74315b7 to 4c548ec Compare October 28, 2025 17:43
@petrukha-ivan
Copy link
Copy Markdown
Contributor Author

@davidkoski done 👍

Copy link
Copy Markdown
Collaborator

@davidkoski davidkoski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this huge effort!

@davidkoski davidkoski merged commit 881ad5a into ml-explore:main Oct 29, 2025
4 checks passed
atdrendel pushed a commit to shareup/mlx-swift-examples that referenced this pull request Dec 6, 2025
* Add LoRA layer keys from config

Note: this is a source breaking change to adapt to a change on the mlx-lm side.  Previously you would write:

```swift
    public func loraLinearLayers() -> LoRALinearLayers {
        model.layers.map { ($0.selfAttn, ["q_proj", "v_proj"]) }
    }
```

now this should read:

```swift
    public var loraLayers: [Module] {
        model.layers
    }
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants