Description
🐞 Describe the Bug
test_load_distributed_checkpoint_dp2
fails with:
E ValueError: Config diff:
E init_method_std_embed`: `0.022` != `0.0625`
E transformer.init_method_std`: `0.022` != `0.0625`
E transformer.init_method_std_attn_proj`: `0.011` != `0.03125`
E transformer.init_method_std_mlp_2`: `0.011` != `0.03125`
E transformer.init_method_std_mlp_1`: `0.022` != `0.0625`
E transformer.init_method_std_qkv`: `0.022` != `0.0625`
Must be some inconsistency between the config creation/loading methods. This bug is completely harmless since we're loading an already initialized checkpoint but could be hiding a bigger problem.
Likely reason: non-architecture config validated before loading the pretrained architecture, so the wrong architecture is used to set the defaults.
🔄 Steps to Reproduce
Run the test
🎯 Expected Behavior
Tests pass