-
Notifications
You must be signed in to change notification settings - Fork 30k
Description
I also wrote it down in peft repo. However this issue is also related to transformers. So i write my question here again.
issue is here in peft(huggingface/peft#1245)
Hello, Sorry for naive question.
I noticed that themodel.generate()
function performed differently when inferrence right after train with trainer.model
and after merge and unload. (Every params are the same.)
So I checked two different object with simple print function.
Difference was the object that contains model.
model = trainer.model
PeftModelForCausalLM(
(base_model): LoraModel(
(model): LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): ModulesToSaveWrapper(
(original_module): Embedding(32008, 5120)
(modules_to_save): ModuleDict(
(default): Embedding(32008, 5120)
)
)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=5120, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=5120, out_features=5120, bias=False)
)
(k_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=5120, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=5120, out_features=5120, bias=False)
)
(v_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=5120, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=5120, out_features=5120, bias=False)
)
(o_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=5120, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=5120, out_features=5120, bias=False)
)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=13824, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=5120, out_features=13824, bias=False)
)
(up_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=5120, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=13824, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=5120, out_features=13824, bias=False)
)
(down_proj): Linear4bit(
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=13824, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=5120, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(base_layer): Linear4bit(in_features=13824, out_features=5120, bias=False)
)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): ModulesToSaveWrapper(
(original_module): Linear(in_features=5120, out_features=32008, bias=False)
(modules_to_save): ModuleDict(
(default): Linear(in_features=5120, out_features=32008, bias=False)
)
)
)
)
)
AutoModelForCasualLM.from_pretrained(after merging lora adapter)
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32008, 5120)
(layers): ModuleList(
(0-39): 40 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
(k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
(v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
(o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
(up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
(down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=5120, out_features=32008, bias=False)
)
I think both modes should work exactly the same way, but when I inferred with the model.generate function, I found that #1 (PeftModelForCausalLM) works much more accurately. I'd like to know why, is there a theoretical or engineering reason for this?
Thanks for watching my long long question!