Skip to content

Support block-modular architecture #242

Open
@tscholak

Description

@tscholak

🎯 Goal (What & Why)

Enable fully modular, per-block configuration in Fast-LLM to follow up on hybrid architecture support introduced in #194.

Currently, hybrid models (e.g., interleaving Mamba 1, Mamba 2, and Transformer blocks) are limited by global block-type configurations: all transformer blocks share one config, and all SSM blocks another. This is too rigid.

We want to:

  • Allow different configurations per block, even for the same type.
  • Support named blocks with configurable weight sharing.
  • Enable expressive, fine-grained architectures, useful for:
    • Exploring different attention mechanisms in a single model.
    • Tying weights across repeated block instances.
    • Designing sparse, pruned, or ablation-based stacks.
    • Preparing for model export pipelines with heterogeneous block stacks.

This would eliminate the current one-size-fits-all limitation and make model stacks in Fast-LLM truly composable and expressive.

🚀 Execution Plan

This is a config and model-construction feature. The main change is replacing the global transformer and ssm sections with a new per-block format.

Key Ideas

  • Add model.blocks: a dict of named block configs (e.g., alice, bob, claire, potato, etc., it doesn't matter what they are called, see example below).
  • Add block_pattern: a list specifying the block sequence by name.
  • Add num_layers: total depth of the model. The pattern repeats to reach this.
  • Allow block-level options like:
    • kind: transformer | ssm | ...
    • attention_type, sliding_window, dt_rank, etc.
    • shared_weights: true for parameter sharing
    • lora: ...
  • Blocks reused by name will share configuration; if shared_weights: true, they’ll also reuse parameters.

Minimal Implementation Path

  1. Define new schema and validate it (e.g., every pattern entry must resolve to a block).
  2. Update model construction to instantiate blocks from model.blocks, repeat pattern to reach num_layers.
  3. Add weight-sharing logic: instantiate shared blocks once, reuse parameters across layers.
  4. Add support for block-level LoRA injection.
  5. Maintain backwards compatibility: for existing models, fall back to current global transformer/ssm layout if model.blocks is absent. Save new checkpoints using the new format.
  6. Extend test coverage: - Stacks with different transformer configs
    • Mixed MQA/GQA/sliding-window blocks
    • Interleaved SSM and transformer blocks
    • Shared and unshared weights
  7. Update documentation with examples and migration guide.

Example Config: One block

model:
  blocks:
    default_transformer:
      kind: transformer
      attention_type: mqa
      use_flash_attention: true
      num_heads: 16
      hidden_size: 4096

  block_pattern: ["default_transformer"]
  num_layers: 48

Example Config: Many blocks

model:
  blocks:
    alice:
      kind: transformer
      attention_type: mqa
      sliding_window: false

    bob:
      kind: transformer
      attention_type: gqa
      sliding_window: true
      shared_weights: true

    claire:
      kind: ssm
      variant: mamba1
      dt_rank: auto

    dave:
      kind: ssm
      variant: discrete_mamba2
      state_size: 16

  block_pattern: ["alice", "bob", "claire", "dave", "bob"]
  num_layers: 15

Here:

  • Pattern repeats 3 times in the 15 layers of the model.
  • bob appears 6 times, but defines weights once (shared).
  • Each block can be configured independently.

📌 Acceptance Criteria

  • model.blocks is supported with flexible per-block config.
  • block_pattern resolves correctly and builds a full stack of layers.
  • Shared weights reduce parameter count where shared_weights: true is set.
  • Legacy config format (transformer, ssm) remains supported with deprecation warning for the time being.
  • Unit tests validate:
    • Per-block config behaviour
    • Mixed block types
    • Shared vs non-shared blocks
  • Documentation updated with clear example configs and usage patterns.

🛠️ Project Management

  • Assign the project to the Fast-LLM project.
  • Set the Estimate field (in days).
  • Use the Size field to categorize the PR size (Large).
  • Assign an owner when opening the issue.

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions