Skip to content

Conversation

@mtaran
Copy link
Contributor

@mtaran mtaran commented Oct 1, 2025

Summary

This PR adds support for the base (non-instruct) version of Qwen3-0.6B to TransformerLens.

Motivation

Currently, TransformerLens supports Qwen/Qwen3-0.6B (the instruct version), but not the base version Qwen/Qwen3-0.6B-Base. Since these models share the same architecture and only differ in weights, adding support is straightforward and requires only updating the model lists.

Changes

  • Added Qwen/Qwen3-0.6B-Base to OFFICIAL_MODEL_NAMES list
  • Added corresponding alias qwen3-0.6b-base to MODEL_ALIASES dictionary

Testing

Both models have been verified to:

  1. Load successfully via HookedTransformer.from_pretrained()
  2. Fetch different weights (as expected for base vs instruct models)

Test output:

Instruct model first embedding weight sum: -0.012408
Base model first embedding weight sum: -0.173449

Additional Context

🤖 Generated with Claude Code

bryce13950 and others added 6 commits June 12, 2025 11:19
This commit adds support for the base (non-instruct) version of Qwen3-0.6B.
The base model (Qwen/Qwen3-0.6B-Base) and instruct model (Qwen/Qwen3-0.6B)
share the same architecture but have different weights. The base model is
suitable for fine-tuning, while the instruct model is optimized for
instruction-following and chat.

Changes:
- Added "Qwen/Qwen3-0.6B-Base" to OFFICIAL_MODEL_NAMES
- Added alias "qwen3-0.6b-base" to MODEL_ALIASES

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add Qwen/Qwen3-0.6B-Base to the free_compatible list in the
Colab_Compatibility notebook to ensure all models in OFFICIAL_MODEL_NAMES
are accounted for in the test suite.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update the model count in Colab_Compatibility notebook output
from 216 to 217 to reflect the addition of Qwen3-0.6B-Base.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@jlarson4 jlarson4 changed the base branch from main to dev January 15, 2026 14:13
@jlarson4 jlarson4 merged commit f53185c into TransformerLensOrg:dev Jan 15, 2026
13 checks passed
jlarson4 added a commit that referenced this pull request Feb 3, 2026
Cherry-picked from v2.17.0 commit f53185c
Adapted for dev-3.x supported_models.py structure

- Added Qwen/Qwen3-0.6B-Base to OFFICIAL_MODEL_NAMES
- Added qwen3-0.6b-base alias to MODEL_ALIASES

Original commit: f53185c Add support for Qwen/Qwen3-0.6B-Base model (#1075)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants