Skip to content

[bug] Inconsistent init_method_std in test_load_distributed_checkpoint_dp2 #88

Closed
@jlamypoirier

Description

@jlamypoirier

🐞 Describe the Bug

test_load_distributed_checkpoint_dp2 fails with:

E           ValueError: Config diff:
E             init_method_std_embed`: `0.022` != `0.0625`
E             transformer.init_method_std`: `0.022` != `0.0625`
E             transformer.init_method_std_attn_proj`: `0.011` != `0.03125`
E             transformer.init_method_std_mlp_2`: `0.011` != `0.03125`
E             transformer.init_method_std_mlp_1`: `0.022` != `0.0625`
E             transformer.init_method_std_qkv`: `0.022` != `0.0625`

Must be some inconsistency between the config creation/loading methods. This bug is completely harmless since we're loading an already initialized checkpoint but could be hiding a bigger problem.
Likely reason: non-architecture config validated before loading the pretrained architecture, so the wrong architecture is used to set the defaults.

🔄 Steps to Reproduce

Run the test

🎯 Expected Behavior

Tests pass

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions