Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While training my custom data for ASTEPC got weight mismatch error #310

Open
Ibrokhimsadikov opened this issue Apr 20, 2023 · 5 comments
Open

Comments

@Ibrokhimsadikov
Copy link

When training ASTEPC model with both my custom and predefined datasets but giving below error

I followed the following notebook:
https://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynbhttps://github.com/yangheng95/PyABSA/blob/v2/examples-v2/aspect_term_extraction/Aspect_Term_Extraction.ipynb

RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([128100, 768]).

@yangheng95
Copy link
Owner

Please provide useful information according to the report: https://github.com/yangheng95/PyABSA/issues/new?assignees=&labels=&template=bug_report.md&title=

@KadriMufti
Copy link

Version
I installed pyabsa version 2.4.1 and torch version 1.13.1 and transformers version 4.27.2

Describe the bug
Hello, I have the same issue. I am trying to finetune your latest multlingual model on my own Arabic dataset starting from the multilingual checkpoint. I am sure the problem is not the dataset. I will paste the error log below. I get an error when I use any of the following options for config.pretrained_bert. I also get an error (see below) when I do not set config.pretrained_bert to any value. There is always an error about state_dict or something:

  • "yangheng/deberta-v3-base-absa-v1.1"
  • "yangheng/deberta-v3-large-absa-v1.1"
  • "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7"
  • "microsoft/mdeberta-v3-base"
  • "bert-base-multilingual-uncased" (this is the default I think)

Sample data:

بصراحة O -100
أنا O -100
ما O -100
أحب O -100
الكاتب O -100
اللي O -100
يدخل O -100
اللغة B-ASP negative
العامية I-ASP negative
في O -100
كتاباته O -100
مع O -100
اني O -100
أمارس O -100
هذا O -100
الخطأ O -100

روايه B-ASP negative
حزينه O -100
قد O -100
لاتستحق O -100
عناء O -100
القراءه O -100

Code To Reproduce

import warnings
warnings.filterwarnings("ignore")
import json
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4"
from pyabsa import ModelSaveOption, DeviceTypeOption
import findfile
from pyabsa import AspectTermExtraction as ATEPC


my_dataset = DatasetItem("my_dataset", ["/app/path/CustomDatasetArabic/custom.train.txt.atepc",                                "/app/path/100.CustomDatasetArabic/custom.test.txt.atepc"])

config = (ATEPC.ATEPCConfigManager.get_atepc_config_multilingual())
config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC
config.evaluate_begin = 4
config.max_seq_len = 500
config.num_epoch = 5
config.batch_size = 16
config.patience = 2
config.log_step = -1
config.seed = [1]
config.show_metric = True
config.verbose = False  # If verbose == True, PyABSA will output the model strcture and seversal processed data examples
config.notice = (
    "This is a finetuned aspect term extraction model, based on ATEPC_MULTILINGUAL_CHECKPOINT, using Arabic data HAAD."  # for memos usage
)
# # config.pretrained_bert = "yangheng/deberta-v3-base-absa-v1.1" 
# # config.pretrained_bert = "yangheng/deberta-v3-large-absa-v1.1" 
# # config.pretrained_bert = "MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7" 
# # config.pretrained_bert = "microsoft/mdeberta-v3-base" 
# # config.pretrained_bert = "bert-base-multilingual-uncased"

trainer = ATEPC.ATEPCTrainer(
    config=config,
    dataset=my_dataset,
    from_checkpoint="multilingual",  # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
    auto_device=DeviceTypeOption.AUTO,  # use cuda if available
    checkpoint_save_mode=ModelSaveOption.SAVE_MODEL_STATE_DICT,  # save state dict only instead of the whole model
    load_aug=False,  # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
    path_to_save="/app/path/NEW_ATEPC_MULTILINGUAL_CHECKPOINT"
)

Expected behavior
I was expecting to see the model being trained and then saved. What should I do?

Screenshots

---------------------------------------------------------------------------
RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
	Missing key(s) in state_dict: "bert4global.embeddings.position_embeddings.weight", "bert4global.embeddings.token_type_embeddings.weight", "bert4global.encoder.layer.0.attention.self.query.weight", "bert4global.encoder.layer.0.attention.self.query.bias", "bert4global.encoder.layer.0.attention.self.key.weight", "bert4global.encoder.layer.0.attention.self.key.bias", "bert4global.encoder.layer.0.attention.self.value.weight", "bert4global.encoder.layer.0.attention.self.value.bias", "bert4global.encoder.layer.1.attention.self.query.weight", "bert4global.encoder.layer.1.attention.self.query.bias", "bert4global.encoder.layer.1.attention.self.key.weight", "bert4global.encoder.layer.1.attention.self.key.bias", "bert4global.encoder.layer.1.attention.self.value.weight", "bert4global.encoder.layer.1.attention.self.value.bias", "bert4global.encoder.layer.2.attention.self.query.weight", "bert4global.encoder.layer.2.attention.self.query.bias", "bert4global.encoder.layer.2.attention.self.key.weight", "bert4global.encoder.layer.2.attention.self.key.bias", "bert4global.encoder.layer.2.attention.self.value.weight", "bert4global.encoder.layer.2.attention.self.value.bias", "bert4global.encoder.layer.3.attention.self.query.weight", "bert4global.encoder.layer.3.attention.self.query.bias", "bert4global.encoder.layer.3.attention.self.key.weight", "bert4global.encoder.layer.3.attention.self.key.bias", "bert4global.encoder.layer.3.attention.self.value.weight", "bert4global.encoder.layer.3.attention.self.value.bias", "bert4global.encoder.layer.4.attention.self.query.weight", "bert4global.encoder.layer.4.attention.self.query.bias", "bert4global.encoder.layer.4.attention.self.key.weight", "bert4global.encoder.layer.4.attention.self.key.bias", "bert4global.encoder.layer.4.attention.self.value.weight", "bert4global.encoder.layer.4.attention.self.value.bias", "bert4global.encoder.layer.5.attention.self.query.weight", "bert4global.encoder.layer.5.attention.self.query.bias", "bert4global.encoder.layer.5.attention.self.key.weight", "bert4global.encoder.layer.5.attention.self.key.bias", "bert4global.encoder.layer.5.attention.self.value.weight", "bert4global.encoder.layer.5.attention.self.value.bias", "bert4global.encoder.layer.6.attention.self.query.weight", "bert4global.encoder.layer.6.attention.self.query.bias", "bert4global.encoder.layer.6.attention.self.key.weight", "bert4global.encoder.layer.6.attention.self.key.bias", "bert4global.encoder.layer.6.attention.self.value.weight", "bert4global.encoder.layer.6.attention.self.value.bias", "bert4global.encoder.layer.7.attention.self.query.weight", "bert4global.encoder.layer.7.attention.self.query.bias", "bert4global.encoder.layer.7.attention.self.key.weight", "bert4global.encoder.layer.7.attention.self.key.bias", "bert4global.encoder.layer.7.attention.self.value.weight", "bert4global.encoder.layer.7.attention.self.value.bias", "bert4global.encoder.layer.8.attention.self.query.weight", "bert4global.encoder.layer.8.attention.self.query.bias", "bert4global.encoder.layer.8.attention.self.key.weight", "bert4global.encoder.layer.8.attention.self.key.bias", "bert4global.encoder.layer.8.attention.self.value.weight", "bert4global.encoder.layer.8.attention.self.value.bias", "bert4global.encoder.layer.9.attention.self.query.weight", "bert4global.encoder.layer.9.attention.self.query.bias", "bert4global.encoder.layer.9.attention.self.key.weight", "bert4global.encoder.layer.9.attention.self.key.bias", "bert4global.encoder.layer.9.attention.self.value.weight", "bert4global.encoder.layer.9.attention.self.value.bias", "bert4global.encoder.layer.10.attention.self.query.weight", "bert4global.encoder.layer.10.attention.self.query.bias", "bert4global.encoder.layer.10.attention.self.key.weight", "bert4global.encoder.layer.10.attention.self.key.bias", "bert4global.encoder.layer.10.attention.self.value.weight", "bert4global.encoder.layer.10.attention.self.value.bias", "bert4global.encoder.layer.11.attention.self.query.weight", "bert4global.encoder.layer.11.attention.self.query.bias", "bert4global.encoder.layer.11.attention.self.key.weight", "bert4global.encoder.layer.11.attention.self.key.bias", "bert4global.encoder.layer.11.attention.self.value.weight", "bert4global.encoder.layer.11.attention.self.value.bias", "bert4global.pooler.dense.weight", "bert4global.pooler.dense.bias". 
	Unexpected key(s) in state_dict: "bert4global.encoder.rel_embeddings.weight", "bert4global.encoder.LayerNorm.weight", "bert4global.encoder.LayerNorm.bias", "bert4global.encoder.layer.0.attention.self.query_proj.weight", "bert4global.encoder.layer.0.attention.self.query_proj.bias", "bert4global.encoder.layer.0.attention.self.key_proj.weight", "bert4global.encoder.layer.0.attention.self.key_proj.bias", "bert4global.encoder.layer.0.attention.self.value_proj.weight", "bert4global.encoder.layer.0.attention.self.value_proj.bias", "bert4global.encoder.layer.1.attention.self.query_proj.weight", "bert4global.encoder.layer.1.attention.self.query_proj.bias", "bert4global.encoder.layer.1.attention.self.key_proj.weight", "bert4global.encoder.layer.1.attention.self.key_proj.bias", "bert4global.encoder.layer.1.attention.self.value_proj.weight", "bert4global.encoder.layer.1.attention.self.value_proj.bias", "bert4global.encoder.layer.2.attention.self.query_proj.weight", "bert4global.encoder.layer.2.attention.self.query_proj.bias", "bert4global.encoder.layer.2.attention.self.key_proj.weight", "bert4global.encoder.layer.2.attention.self.key_proj.bias", "bert4global.encoder.layer.2.attention.self.value_proj.weight", "bert4global.encoder.layer.2.attention.self.value_proj.bias", "bert4global.encoder.layer.3.attention.self.query_proj.weight", "bert4global.encoder.layer.3.attention.self.query_proj.bias", "bert4global.encoder.layer.3.attention.self.key_proj.weight", "bert4global.encoder.layer.3.attention.self.key_proj.bias", "bert4global.encoder.layer.3.attention.self.value_proj.weight", "bert4global.encoder.layer.3.attention.self.value_proj.bias", "bert4global.encoder.layer.4.attention.self.query_proj.weight", "bert4global.encoder.layer.4.attention.self.query_proj.bias", "bert4global.encoder.layer.4.attention.self.key_proj.weight", "bert4global.encoder.layer.4.attention.self.key_proj.bias", "bert4global.encoder.layer.4.attention.self.value_proj.weight", "bert4global.encoder.layer.4.attention.self.value_proj.bias", "bert4global.encoder.layer.5.attention.self.query_proj.weight", "bert4global.encoder.layer.5.attention.self.query_proj.bias", "bert4global.encoder.layer.5.attention.self.key_proj.weight", "bert4global.encoder.layer.5.attention.self.key_proj.bias", "bert4global.encoder.layer.5.attention.self.value_proj.weight", "bert4global.encoder.layer.5.attention.self.value_proj.bias", "bert4global.encoder.layer.6.attention.self.query_proj.weight", "bert4global.encoder.layer.6.attention.self.query_proj.bias", "bert4global.encoder.layer.6.attention.self.key_proj.weight", "bert4global.encoder.layer.6.attention.self.key_proj.bias", "bert4global.encoder.layer.6.attention.self.value_proj.weight", "bert4global.encoder.layer.6.attention.self.value_proj.bias", "bert4global.encoder.layer.7.attention.self.query_proj.weight", "bert4global.encoder.layer.7.attention.self.query_proj.bias", "bert4global.encoder.layer.7.attention.self.key_proj.weight", "bert4global.encoder.layer.7.attention.self.key_proj.bias", "bert4global.encoder.layer.7.attention.self.value_proj.weight", "bert4global.encoder.layer.7.attention.self.value_proj.bias", "bert4global.encoder.layer.8.attention.self.query_proj.weight", "bert4global.encoder.layer.8.attention.self.query_proj.bias", "bert4global.encoder.layer.8.attention.self.key_proj.weight", "bert4global.encoder.layer.8.attention.self.key_proj.bias", "bert4global.encoder.layer.8.attention.self.value_proj.weight", "bert4global.encoder.layer.8.attention.self.value_proj.bias", "bert4global.encoder.layer.9.attention.self.query_proj.weight", "bert4global.encoder.layer.9.attention.self.query_proj.bias", "bert4global.encoder.layer.9.attention.self.key_proj.weight", "bert4global.encoder.layer.9.attention.self.key_proj.bias", "bert4global.encoder.layer.9.attention.self.value_proj.weight", "bert4global.encoder.layer.9.attention.self.value_proj.bias", "bert4global.encoder.layer.10.attention.self.query_proj.weight", "bert4global.encoder.layer.10.attention.self.query_proj.bias", "bert4global.encoder.layer.10.attention.self.key_proj.weight", "bert4global.encoder.layer.10.attention.self.key_proj.bias", "bert4global.encoder.layer.10.attention.self.value_proj.weight", "bert4global.encoder.layer.10.attention.self.value_proj.bias", "bert4global.encoder.layer.11.attention.self.query_proj.weight", "bert4global.encoder.layer.11.attention.self.query_proj.bias", "bert4global.encoder.layer.11.attention.self.key_proj.weight", "bert4global.encoder.layer.11.attention.self.key_proj.bias", "bert4global.encoder.layer.11.attention.self.value_proj.weight", "bert4global.encoder.layer.11.attention.self.value_proj.bias". 
	size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
	size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
	size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

@yangheng95
Copy link
Owner

please pip install pyabsa -U and see if it is repaired

@KadriMufti
Copy link

I have reinstalled as you said and the result has not changed. I still get error.

RuntimeError: Error(s) in loading state_dict for FAST_LCF_ATEPC:
	size mismatch for bert4global.embeddings.word_embeddings.weight: copying a param with shape torch.Size([251000, 768]) from checkpoint, the shape in current model is torch.Size([105879, 768]).
	size mismatch for dense.weight: copying a param with shape torch.Size([3, 768]) from checkpoint, the shape in current model is torch.Size([4, 768]).
	size mismatch for dense.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([4]).

Currently config.model is FAST_LCF_ATEPC. Should I change the config.model to something else like FAST_LCFS_ATEPC or LCFS_ATEPC_LARGE, etc.?

Also, you wrote in the documentation here:

There are three types of APC models for aspect term extraction, which are based on the local context focus mechanism Notice: when you select to use a model, please make sure to carefully manage the configurations, e.g., for glove-based models, you need to set hidden dim and embed_dim manually. We already provide some pre-defined configurations.

Should I change "hidden dim and embed_dim manually" if it will solve the problem, and if so how can I do that?

_atepc_config_multilingual = {
    "model": LCF_ATEPC,
    "optimizer": "adamw",
    "learning_rate": 0.00002,
    "pretrained_bert": "bert-base-multilingual-uncased",
    "use_bert_spc": True,
    "cache_dataset": True,
    "warmup_step": -1,
    "show_metric": False,
    "max_seq_len": 80,
    "SRD": 3,
    "use_syntax_based_SRD": False,
    "lcf": "cdw",
    "window": "lr",
    "dropout": 0.5,
    "l2reg": 0.00001,
    "num_epoch": 10,
    "batch_size": 16,
    "initializer": "xavier_uniform_",
    "seed": 52,
    "output_dim": 2,
    "log_step": 50,
    "patience": 99999,
    "gradient_accumulation_steps": 1,
    "dynamic_truncate": True,
    "srd_alignment": True,  # for srd_alignment
    "evaluate_begin": 0,
}

Note:
The code works if I train a new model from scratch (no checkpoint used, and more time and data necessary), so there must be some mismatch between the multilingual checkpoint model and the config.pretrained_bert and/or config.model options.

@yangheng95
Copy link
Owner

This is a known issue caused by transformers breaking change, which version of pyabsa do you use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants