While training my custom data for ASTEPC got weight mismatch error #310

Ibrokhimsadikov · 2023-04-20T18:35:10Z

When training ASTEPC model with both my custom and predefined datasets but giving below error

I followed the following notebook:
https://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynbhttps://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynb

RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([128100, 768]).

yangheng95 · 2023-04-20T22:35:56Z

Please provide useful information according to the report: https://github.com/yangheng95/PyABSA/issues/new?assignees=&labels=&template=bug_report.md&title=

KadriMufti · 2024-02-28T16:53:13Z

Version
I installed pyabsa version 2.4.1 and torch version 1.13.1 and transformers version 4.27.2

Describe the bug
Hello, I have the same issue. I am trying to finetune your latest multlingual model on my own Arabic dataset starting from the multilingual checkpoint. I am sure the problem is not the dataset. I will paste the error log below. I get an error when I use any of the following options for config.pretrained_bert. I also get an error (see below) when I do not set config.pretrained_bert to any value. There is always an error about state_dict or something:

"yangheng/deberta-v3-base-absa-v1.1"
"yangheng/deberta-v3-large-absa-v1.1"
"MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
"microsoft/mdeberta-v3-base"
"bert-base-multilingual-uncased" (this is the default I think)

Sample data:

بصراحة O -100
أنا O -100
ما O -100
أحب O -100
الكاتب O -100
اللي O -100
يدخل O -100
اللغة B-ASP negative
العامية I-ASP negative
في O -100
كتاباته O -100
مع O -100
اني O -100
أمارس O -100
هذا O -100
الخطأ O -100

روايه B-ASP negative
حزينه O -100
قد O -100
لاتستحق O -100
عناء O -100
القراءه O -100

Code To Reproduce

import warnings
warnings.filterwarnings("ignore")
import json
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4"
from pyabsa import ModelSaveOption, DeviceTypeOption
import findfile
from pyabsa import AspectTermExtraction as ATEPC


my_dataset = DatasetItem("my_dataset", ["/app/path/CustomDatasetArabic/custom.train.txt.atepc",                                "/app/path/100.CustomDatasetArabic/custom.test.txt.atepc"])

config = (ATEPC.ATEPCConfigManager.get_atepc_config_multilingual())
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 4
config.max_seq_len = 500
config.num_epoch = 5
config.batch_size = 16
config.patience = 2
config.log_step = -1
config.seed = [1]
config.show_metric = True
config.verbose = False  # If verbose == True, PyABSA will output the model strcture and seversal processed data examples
config.notice = (
    "This is a finetuned aspect term extraction model, based on ATEPC_MULTILINGUAL_CHECKPOINT, using Arabic data HAAD."  # for memos usage
)
# # config.pretrained_bert = "yangheng/deberta-v3-base-absa-v1.1" 
# # config.pretrained_bert = "yangheng/deberta-v3-large-absa-v1.1" 
# # config.pretrained_bert = "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7" 
# # config.pretrained_bert = "microsoft/mdeberta-v3-base" 
# # config.pretrained_bert = "bert-base-multilingual-uncased"

trainer = ATEPC.ATEPCTrainer(
    config=config,
    dataset=my_dataset,
    from_checkpoint="multilingual",  # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
    auto_device=DeviceTypeOption.AUTO,  # use cuda if available
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,  # save state dict only instead of the whole model
    load_aug=False,  # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
    path_to_save="/app/path/NEW_ATEPC_MULTILINGUAL_CHECKPOINT"
)

Expected behavior
I was expecting to see the model being trained and then saved. What should I do?

Screenshots

---------------------------------------------------------------------------
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
	Missing key(s) in state_dict: "bert4global.embeddings.position_embeddings.weight", "bert4global.embeddings.token_type_embeddings.weight", "bert4global.encoder.layer.0.attention.self.query.weight", "bert4global.encoder.layer.0.attention.self.query.bias", "bert4global.encoder.layer.0.attention.self.key.weight", "bert4global.encoder.layer.0.attention.self.key.bias", "bert4global.encoder.layer.0.attention.self.value.weight", "bert4global.encoder.layer.0.attention.self.value.bias", "bert4global.encoder.layer.1.attention.self.query.weight", "bert4global.encoder.layer.1.attention.self.query.bias", "bert4global.encoder.layer.1.attention.self.key.weight", "bert4global.encoder.layer.1.attention.self.key.bias", "bert4global.encoder.layer.1.attention.self.value.weight", "bert4global.encoder.layer.1.attention.self.value.bias", "bert4global.encoder.layer.2.attention.self.query.weight", "bert4global.encoder.layer.2.attention.self.query.bias", "bert4global.encoder.layer.2.attention.self.key.weight", "bert4global.encoder.layer.2.attention.self.key.bias", "bert4global.encoder.layer.2.attention.self.value.weight", "bert4global.encoder.layer.2.attention.self.value.bias", "bert4global.encoder.layer.3.attention.self.query.weight", "bert4global.encoder.layer.3.attention.self.query.bias", "bert4global.encoder.layer.3.attention.self.key.weight", "bert4global.encoder.layer.3.attention.self.key.bias", "bert4global.encoder.layer.3.attention.self.value.weight", "bert4global.encoder.layer.3.attention.self.value.bias", "bert4global.encoder.layer.4.attention.self.query.weight", "bert4global.encoder.layer.4.attention.self.query.bias", "bert4global.encoder.layer.4.attention.self.key.weight", "bert4global.encoder.layer.4.attention.self.key.bias", "bert4global.encoder.layer.4.attention.self.value.weight", "bert4global.encoder.layer.4.attention.self.value.bias", "bert4global.encoder.layer.5.attention.self.query.weight", "bert4global.encoder.layer.5.attention.self.query.bias", "bert4global.encoder.layer.5.attention.self.key.weight", "bert4global.encoder.layer.5.attention.self.key.bias", "bert4global.encoder.layer.5.attention.self.value.weight", "bert4global.encoder.layer.5.attention.self.value.bias", "bert4global.encoder.layer.6.attention.self.query.weight", "bert4global.encoder.layer.6.attention.self.query.bias", "bert4global.encoder.layer.6.attention.self.key.weight", "bert4global.encoder.layer.6.attention.self.key.bias", "bert4global.encoder.layer.6.attention.self.value.weight", "bert4global.encoder.layer.6.attention.self.value.bias", "bert4global.encoder.layer.7.attention.self.query.weight", "bert4global.encoder.layer.7.attention.self.query.bias", "bert4global.encoder.layer.7.attention.self.key.weight", "bert4global.encoder.layer.7.attention.self.key.bias", "bert4global.encoder.layer.7.attention.self.value.weight", "bert4global.encoder.layer.7.attention.self.value.bias", "bert4global.encoder.layer.8.attention.self.query.weight", "bert4global.encoder.layer.8.attention.self.query.bias", "bert4global.encoder.layer.8.attention.self.key.weight", "bert4global.encoder.layer.8.attention.self.key.bias", "bert4global.encoder.layer.8.attention.self.value.weight", "bert4global.encoder.layer.8.attention.self.value.bias", "bert4global.encoder.layer.9.attention.self.query.weight", "bert4global.encoder.layer.9.attention.self.query.bias", "bert4global.encoder.layer.9.attention.self.key.weight", "bert4global.encoder.layer.9.attention.self.key.bias", "bert4global.encoder.layer.9.attention.self.value.weight", "bert4global.encoder.layer.9.attention.self.value.bias", "bert4global.encoder.layer.10.attention.self.query.weight", "bert4global.encoder.layer.10.attention.self.query.bias", "bert4global.encoder.layer.10.attention.self.key.weight", "bert4global.encoder.layer.10.attention.self.key.bias", "bert4global.encoder.layer.10.attention.self.value.weight", "bert4global.encoder.layer.10.attention.self.value.bias", "bert4global.encoder.layer.11.attention.self.query.weight", "bert4global.encoder.layer.11.attention.self.query.bias", "bert4global.encoder.layer.11.attention.self.key.weight", "bert4global.encoder.layer.11.attention.self.key.bias", "bert4global.encoder.layer.11.attention.self.value.weight", "bert4global.encoder.layer.11.attention.self.value.bias", "bert4global.pooler.dense.weight", "bert4global.pooler.dense.bias". 
	Unexpected key(s) in state_dict: "bert4global.encoder.rel_embeddings.weight", "bert4global.encoder.LayerNorm.weight", "bert4global.encoder.LayerNorm.bias", "bert4global.encoder.layer.0.attention.self.query_proj.weight", "bert4global.encoder.layer.0.attention.self.query_proj.bias", "bert4global.encoder.layer.0.attention.self.key_proj.weight", "bert4global.encoder.layer.0.attention.self.key_proj.bias", "bert4global.encoder.layer.0.attention.self.value_proj.weight", "bert4global.encoder.layer.0.attention.self.value_proj.bias", "bert4global.encoder.layer.1.attention.self.query_proj.weight", "bert4global.encoder.layer.1.attention.self.query_proj.bias", "bert4global.encoder.layer.1.attention.self.key_proj.weight", "bert4global.encoder.layer.1.attention.self.key_proj.bias", "bert4global.encoder.layer.1.attention.self.value_proj.weight", "bert4global.encoder.layer.1.attention.self.value_proj.bias", "bert4global.encoder.layer.2.attention.self.query_proj.weight", "bert4global.encoder.layer.2.attention.self.query_proj.bias", "bert4global.encoder.layer.2.attention.self.key_proj.weight", "bert4global.encoder.layer.2.attention.self.key_proj.bias", "bert4global.encoder.layer.2.attention.self.value_proj.weight", "bert4global.encoder.layer.2.attention.self.value_proj.bias", "bert4global.encoder.layer.3.attention.self.query_proj.weight", "bert4global.encoder.layer.3.attention.self.query_proj.bias", "bert4global.encoder.layer.3.attention.self.key_proj.weight", "bert4global.encoder.layer.3.attention.self.key_proj.bias", "bert4global.encoder.layer.3.attention.self.value_proj.weight", "bert4global.encoder.layer.3.attention.self.value_proj.bias", "bert4global.encoder.layer.4.attention.self.query_proj.weight", "bert4global.encoder.layer.4.attention.self.query_proj.bias", "bert4global.encoder.layer.4.attention.self.key_proj.weight", "bert4global.encoder.layer.4.attention.self.key_proj.bias", "bert4global.encoder.layer.4.attention.self.value_proj.weight", "bert4global.encoder.layer.4.attention.self.value_proj.bias", "bert4global.encoder.layer.5.attention.self.query_proj.weight", "bert4global.encoder.layer.5.attention.self.query_proj.bias", "bert4global.encoder.layer.5.attention.self.key_proj.weight", "bert4global.encoder.layer.5.attention.self.key_proj.bias", "bert4global.encoder.layer.5.attention.self.value_proj.weight", "bert4global.encoder.layer.5.attention.self.value_proj.bias", "bert4global.encoder.layer.6.attention.self.query_proj.weight", "bert4global.encoder.layer.6.attention.self.query_proj.bias", "bert4global.encoder.layer.6.attention.self.key_proj.weight", "bert4global.encoder.layer.6.attention.self.key_proj.bias", "bert4global.encoder.layer.6.attention.self.value_proj.weight", "bert4global.encoder.layer.6.attention.self.value_proj.bias", "bert4global.encoder.layer.7.attention.self.query_proj.weight", "bert4global.encoder.layer.7.attention.self.query_proj.bias", "bert4global.encoder.layer.7.attention.self.key_proj.weight", "bert4global.encoder.layer.7.attention.self.key_proj.bias", "bert4global.encoder.layer.7.attention.self.value_proj.weight", "bert4global.encoder.layer.7.attention.self.value_proj.bias", "bert4global.encoder.layer.8.attention.self.query_proj.weight", "bert4global.encoder.layer.8.attention.self.query_proj.bias", "bert4global.encoder.layer.8.attention.self.key_proj.weight", "bert4global.encoder.layer.8.attention.self.key_proj.bias", "bert4global.encoder.layer.8.attention.self.value_proj.weight", "bert4global.encoder.layer.8.attention.self.value_proj.bias", "bert4global.encoder.layer.9.attention.self.query_proj.weight", "bert4global.encoder.layer.9.attention.self.query_proj.bias", "bert4global.encoder.layer.9.attention.self.key_proj.weight", "bert4global.encoder.layer.9.attention.self.key_proj.bias", "bert4global.encoder.layer.9.attention.self.value_proj.weight", "bert4global.encoder.layer.9.attention.self.value_proj.bias", "bert4global.encoder.layer.10.attention.self.query_proj.weight", "bert4global.encoder.layer.10.attention.self.query_proj.bias", "bert4global.encoder.layer.10.attention.self.key_proj.weight", "bert4global.encoder.layer.10.attention.self.key_proj.bias", "bert4global.encoder.layer.10.attention.self.value_proj.weight", "bert4global.encoder.layer.10.attention.self.value_proj.bias", "bert4global.encoder.layer.11.attention.self.query_proj.weight", "bert4global.encoder.layer.11.attention.self.query_proj.bias", "bert4global.encoder.layer.11.attention.self.key_proj.weight", "bert4global.encoder.layer.11.attention.self.key_proj.bias", "bert4global.encoder.layer.11.attention.self.value_proj.weight", "bert4global.encoder.layer.11.attention.self.value_proj.bias". 
	size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
	size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
	size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

yangheng95 · 2024-02-29T13:04:17Z

please pip install pyabsa -U and see if it is repaired

KadriMufti · 2024-03-01T12:03:08Z

I have reinstalled as you said and the result has not changed. I still get error.

RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
	size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
	size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
	size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

Currently config.model is FAST_LCF_ATEPC. Should I change the config.model to something else like FAST_LCFS_ATEPC or LCFS_ATEPC_LARGE, etc.?

Also, you wrote in the documentation here:

There are three types of APC models for aspect term extraction, which are based on the local context focus mechanism Notice: when you select to use a model, please make sure to carefully manage the configurations, e.g., for glove-based models, you need to set hidden dim and embed_dim manually. We already provide some pre-defined configurations.

Should I change "hidden dim and embed_dim manually" if it will solve the problem, and if so how can I do that?

_atepc_config_multilingual = {
    "model": LCF_ATEPC,
    "optimizer": "adamw",
    "learning_rate": 0.00002,
    "pretrained_bert": "bert-base-multilingual-uncased",
    "use_bert_spc": True,
    "cache_dataset": True,
    "warmup_step": -1,
    "show_metric": False,
    "max_seq_len": 80,
    "SRD": 3,
    "use_syntax_based_SRD": False,
    "lcf": "cdw",
    "window": "lr",
    "dropout": 0.5,
    "l2reg": 0.00001,
    "num_epoch": 10,
    "batch_size": 16,
    "initializer": "xavier_uniform_",
    "seed": 52,
    "output_dim": 2,
    "log_step": 50,
    "patience": 99999,
    "gradient_accumulation_steps": 1,
    "dynamic_truncate": True,
    "srd_alignment": True,  # for srd_alignment
    "evaluate_begin": 0,
}

Note:
The code works if I train a new model from scratch (no checkpoint used, and more time and data necessary), so there must be some mismatch between the multilingual checkpoint model and the config.pretrained_bert and/or config.model options.

yangheng95 · 2024-03-02T22:30:37Z

This is a known issue caused by transformers breaking change, which version of pyabsa do you use?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

While training my custom data for ASTEPC got weight mismatch error #310

While training my custom data for ASTEPC got weight mismatch error #310

Ibrokhimsadikov commented Apr 20, 2023

yangheng95 commented Apr 20, 2023

KadriMufti commented Feb 28, 2024

yangheng95 commented Feb 29, 2024

KadriMufti commented Mar 1, 2024

yangheng95 commented Mar 2, 2024

While training my custom data for ASTEPC got weight mismatch error #310

While training my custom data for ASTEPC got weight mismatch error #310

Comments

Ibrokhimsadikov commented Apr 20, 2023

yangheng95 commented Apr 20, 2023

KadriMufti commented Feb 28, 2024

yangheng95 commented Feb 29, 2024

KadriMufti commented Mar 1, 2024

yangheng95 commented Mar 2, 2024