Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model saving does not output state_dict #334

Open
Kensvin28 opened this issue Jul 8, 2023 · 6 comments
Open

Model saving does not output state_dict #334

Kensvin28 opened this issue Jul 8, 2023 · 6 comments

Comments

@Kensvin28
Copy link

Kensvin28 commented Jul 8, 2023

PyABSA Version (Required)

2.3.1

Code To Reproduce (Required)

from pyabsa import ModelSaveOption, DeviceTypeOption
import warnings

warnings.filterwarnings("ignore")

config.batch_size = 8
config.patience = 20
config.log_step = -1
config.max_seq_len = 256
config.seed = 1
config.verbose = False # If verbose == True, PyABSA will output the model structure and several processed data examples
config.notice = (
"This is an training example for aspect term extraction" # for memos usage
)

trainer = ASTE.ASTETrainer(
config=config,
dataset=dataset,
# from_checkpoint="english", # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
auto_device='cuda', # use cuda if available
checkpoint_save_mode=ModelSaveOption.SAVE_FULL_MODEL, # save state dict only instead of the whole model
load_aug=False, # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
)

Full Console Output (Required)

[2023-07-08 02:27:10] (2.3.1) Set Model Device: cuda
[2023-07-08 02:27:10] (2.3.1) Device Name: Tesla T4
2023-07-08 02:27:10,136 INFO: PyABSA version: 2.3.1
2023-07-08 02:27:10,137 INFO: Transformers version: 4.30.2
2023-07-08 02:27:10,138 INFO: Torch version: 2.0.1+cu117+cuda11.7
2023-07-08 02:27:10,138 INFO: Device: Tesla T4
2023-07-08 02:27:10,140 INFO: 407.Shopee in the trainer is not a exact path, will search dataset in current working directory
FindFile Warning --> multiple targets ['integrated_datasets/aste_datasets/407.Shopee', 'integrated_datasets/aste_datasets/407.Shopee/.ipynb_checkpoints'] found, only return the shortest path: <integrated_datasets/aste_datasets/407.Shopee>
2023-07-08 02:27:10,146 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance.
2023-07-08 02:27:11,753 INFO: Load dataset from integrated_datasets/aste_datasets/407.Shopee/train.txt
preparing dataloader: 2%|▏ | 10/523 [00:00<00:05, 96.70it/s]
EOL while scanning string literal (, line 1)
preparing dataloader: 100%|██████████| 523/523 [00:05<00:00, 98.53it/s]
2023-07-08 02:27:18,110 INFO: Load dataset from integrated_datasets/aste_datasets/407.Shopee/test.txt
preparing dataloader: 51%|█████▏ | 54/105 [00:00<00:00, 97.42it/s]
EOL while scanning string literal (, line 1)
preparing dataloader: 100%|██████████| 105/105 [00:01<00:00, 92.39it/s]
2023-07-08 02:27:19,812 INFO: Load dataset from integrated_datasets/aste_datasets/407.Shopee/dev.txt
preparing dataloader: 100%|██████████| 71/71 [00:00<00:00, 100.00it/s]
building vocab...
converting data to features: 100%|██████████| 522/522 [00:31<00:00, 16.66it/s]
converting data to features: 100%|██████████| 104/104 [00:07<00:00, 14.29it/s]
converting data to features: 100%|██████████| 71/71 [00:03<00:00, 20.75it/s]
2023-07-08 02:28:02,765 INFO: Save cache dataset to emcgcn.407.Shopee.dataset.b58ef8d99282bf35c7523e9d4fe3c00be3acbf79e1c910c9b38732fded1e3432.cache

Some weights of the model checkpoint at yangheng/deberta-v3-base-absa-v1.1 were not used when initializing DebertaV2Model: ['pooler.dense.bias', 'classifier.bias', 'classifier.weight', 'pooler.dense.weight']

  • This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    [2023-07-08 02:28:21] (2.3.1) ABSADatasetsVersion:None --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x7fbe3367b9d0> --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) PyABSAVersion:2.3.1 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) SRD:3 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) TorchVersion:2.0.1+cu117+cuda11.7 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) TransformersVersion:4.30.2 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) adam_epsilon:1e-08 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) auto_device:cuda --> Calling Count:2
    [2023-07-08 02:28:21] (2.3.1) batch_size:8 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) cache_dataset:True --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) checkpoint_save_mode:2 --> Calling Count:4
    [2023-07-08 02:28:21] (2.3.1) cross_validate_fold:-1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) dataset_file:{'train': ['integrated_datasets/aste_datasets/407.Shopee/train.txt'], 'test': ['integrated_datasets/aste_datasets/407.Shopee/test.txt'], 'valid': ['integrated_datasets/aste_datasets/407.Shopee/dev.txt']} --> Calling Count:17
    [2023-07-08 02:28:21] (2.3.1) dataset_name:407.Shopee --> Calling Count:3
    [2023-07-08 02:28:21] (2.3.1) dca_layer:3 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) dca_p:1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) deep_ensemble:False --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) deprel_size:47 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) deprel_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c6520> --> Calling Count:697
    [2023-07-08 02:28:21] (2.3.1) device:cuda --> Calling Count:2
    [2023-07-08 02:28:21] (2.3.1) device_name:Tesla T4 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) dlcf_a:2 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) dropout:0.5 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) dynamic_truncate:True --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) emb_dropout:0.5 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) embed_dim:768 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) epochs:100 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) eta:1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) eta_lr:0.1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) evaluate_begin:0 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) from_checkpoint:None --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) gcn_dim:300 --> Calling Count:6
    [2023-07-08 02:28:21] (2.3.1) hidden_dim:768 --> Calling Count:4
    [2023-07-08 02:28:21] (2.3.1) index_to_label:OrderedDict([(0, 'N'), (1, 'B-A'), (2, 'I-A'), (3, 'A'), (4, 'B-O'), (5, 'I-O'), (6, 'O'), (7, 'Negative'), (8, 'Neutral'), (9, 'Positive')]) --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) inference_model:None --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) initializer:xavier_uniform_ --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) l2reg:1e-06 --> Calling Count:3
    [2023-07-08 02:28:21] (2.3.1) label_to_index:OrderedDict([('N', 0), ('B-A', 1), ('I-A', 2), ('A', 3), ('B-O', 4), ('I-O', 5), ('O', 6), ('Negative', 7), ('Neutral', 8), ('Positive', 9)]) --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) lcf:cdw --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) learning_rate:2e-05 --> Calling Count:3
    [2023-07-08 02:28:21] (2.3.1) load_aug:False --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) log_step:-1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) logger:<Logger emcgcn (INFO)> --> Calling Count:10
    [2023-07-08 02:28:21] (2.3.1) lsa:False --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) max_seq_len:256 --> Calling Count:34193
    [2023-07-08 02:28:21] (2.3.1) model:<class 'pyabsa.tasks.AspectSentimentTripletExtraction.models.model.EMCGCN'> --> Calling Count:5
    [2023-07-08 02:28:21] (2.3.1) model_name:emcgcn --> Calling Count:2
    [2023-07-08 02:28:21] (2.3.1) model_path_to_save:checkpoints --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) notice:This is an training example for aspect term extraction --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) num_epoch:10 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) num_layers:1 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) optimizer:adamw --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) output_dim:10 --> Calling Count:7
    [2023-07-08 02:28:21] (2.3.1) overwrite_cache:False --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) path_to_save:None --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) patience:20 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) pooling:avg --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) post_size:206 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) post_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c6ca0> --> Calling Count:697
    [2023-07-08 02:28:21] (2.3.1) postag_size:155 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) postag_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c68b0> --> Calling Count:697
    [2023-07-08 02:28:21] (2.3.1) pretrained_bert:yangheng/deberta-v3-base-absa-v1.1 --> Calling Count:5
    [2023-07-08 02:28:21] (2.3.1) relation_constraint:True --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) save_mode:2 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) seed:1 --> Calling Count:7
    [2023-07-08 02:28:21] (2.3.1) sigma:0.3 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) similarity_threshold:1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) spacy_model:en_core_web_sm --> Calling Count:5
    [2023-07-08 02:28:21] (2.3.1) srd_alignment:True --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) symmetry_decoding:False --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) syn_post_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c62e0> --> Calling Count:699
    [2023-07-08 02:28:21] (2.3.1) synpost_size:7 --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) task:triplet --> Calling Count:39406
    [2023-07-08 02:28:21] (2.3.1) task_code:ASTE --> Calling Count:2
    [2023-07-08 02:28:21] (2.3.1) task_name:Aspect Sentiment Triple Extraction --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) token_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c62b0> --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) tokenizer:<pyabsa.framework.tokenizer_class.tokenizer_class.PretrainedTokenizer object at 0x7fbe3367bdf0> --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) use_amp:False --> Calling Count:1
    [2023-07-08 02:28:21] (2.3.1) use_bert_spc:True --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) use_syntax_based_SRD:False --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) verbose:False --> Calling Count:3
    [2023-07-08 02:28:21] (2.3.1) warmup_step:-1 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) weight_decay:0.0 --> Calling Count:0
    [2023-07-08 02:28:21] (2.3.1) window:lr --> Calling Count:0
    2023-07-08 02:28:21,313 INFO: ***** Running training for Aspect Sentiment Triple Extraction *****
    2023-07-08 02:28:21,314 INFO: Training set examples = 522
    2023-07-08 02:28:21,315 INFO: Valid set examples = 71
    2023-07-08 02:28:21,315 INFO: Test set examples = 104
    2023-07-08 02:28:21,316 INFO: Total params = 185533758, Trainable params = 185533758, Non-trainable params = 0
    2023-07-08 02:28:21,317 INFO: Batch size = 8
    2023-07-08 02:28:21,318 INFO: Num steps = 660
    Epoch: 0 | Smooth Loss: 0.6131: 100%|██████████| 66/66 [00:53<00:00, 1.22it/s, Dev F1:0.00(max:0.00)]
    Epoch: 1 | Smooth Loss: 0.5229: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:30.61(max:30.61)]
    Epoch: 2 | Smooth Loss: 0.4670: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:38.51(max:38.51)]
    Epoch: 3 | Smooth Loss: 0.4326: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:39.89(max:39.89)]
    Epoch: 4 | Smooth Loss: 0.4026: 100%|██████████| 66/66 [00:52<00:00, 1.27it/s, Dev F1:42.65(max:42.65)]
    Epoch: 5 | Smooth Loss: 0.3777: 100%|██████████| 66/66 [00:50<00:00, 1.30it/s, Dev F1:40.60(max:42.65)]
    Epoch: 6 | Smooth Loss: 0.3556: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:45.18(max:45.18)]
    Epoch: 7 | Smooth Loss: 0.3350: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:47.93(max:47.93)]
    Epoch: 8 | Smooth Loss: 0.3167: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:51.47(max:51.47)]
    Epoch: 9 | Smooth Loss: 0.3003: 100%|██████████| 66/66 [00:50<00:00, 1.31it/s, Dev F1:50.28(max:51.47)]
    [2023-07-08 02:37:02] (2.3.1) Loading best model: checkpoints/emcgcn_407.Shopee_f1_51.47/ and evaluating on test set

AttributeError Traceback (most recent call last)
~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in _check_seekable(f)
353 try:
--> 354 f.seek(f.tell())
355 return True

AttributeError: 'NoneType' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
/tmp/ipykernel_469/2016644588.py in <cell line: 16>()
14 )
15
---> 16 trainer = ASTE.ASTETrainer(
17 config=config,
18 dataset=dataset,

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/trainer/trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device, path_to_save, load_aug)
65 self.config.task_name = TaskNameOption().get(self.config.task_code)
66
---> 67 self._run()

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/framework/trainer_class/trainer_template.py in _run(self)
239 self.config.seed = s
240 if self.config.checkpoint_save_mode:
--> 241 model_path.append(self.training_instructor(self.config).run())
242 else:
243 # always return the last trained model if you don't save trained model

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/instructor/instructor.py in run(self)
869 # Loss and Optimizer
870 criterion = nn.CrossEntropyLoss(ignore_index=-1)
--> 871 return self._train(criterion)
872
873 def _train(self, criterion):

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/instructor/instructor.py in _train(self, criterion)
884 return self._k_fold_train_and_evaluate(criterion)
885 else:
--> 886 return self._train_and_evaluate(criterion)

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/instructor/instructor.py in _train_and_evaluate(self, criterion)
441 "Loading best model: {} and evaluating on test set ".format(save_path)
442 )
--> 443 self._reload_model_state_dict(save_path)
444 joint_precision, joint_recall, joint_f1 = self._evaluate_f1(
445 self.test_dataloader

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/framework/instructor_class/instructor_template.py in _reload_model_state_dict(self, ckpt)
119 else:
120 self.model.load_state_dict(
--> 121 torch.load(find_file(ckpt, or_key=[".bin", "state_dict"]))
122 )
123

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in load(f, map_location, pickle_module, weights_only, **pickle_load_args)
789 pickle_load_args['encoding'] = 'utf-8'
790
--> 791 with _open_file_like(f, 'rb') as opened_file:
792 if _is_zipfile(opened_file):
793 # The zipfile reader is going to advance the current file position.

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in _open_file_like(name_or_buffer, mode)
274 return _open_buffer_writer(name_or_buffer)
275 elif 'r' in mode:
--> 276 return _open_buffer_reader(name_or_buffer)
277 else:
278 raise RuntimeError(f"Expected 'r' or 'w' in mode but got {mode}")

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in init(self, buffer)
259 def init(self, buffer):
260 super().init(buffer)
--> 261 _check_seekable(buffer)
262
263

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in _check_seekable(f)
355 return True
356 except (io.UnsupportedOperation, AttributeError) as e:
--> 357 raise_err_msg(["seek", "tell"], e)
358 return False
359

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in raise_err_msg(patterns, e)
348 + " Please pre-load the data into a buffer like io.BytesIO and"
349 + " try to load from it instead.")
--> 350 raise type(e)(msg)
351 raise e
352

AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

Describe the bug

I don't know why, but the model does not save the state_dict file now, so every time it is testing after training, it shows this error.

Expected behavior

Should be able to get state_dict when saving trained model and execute the testing without error.

@yangheng95
Copy link
Owner

Could you please check your torch version and transformers version? And can you check if the .state_dict exists in your file system?

@Kensvin28
Copy link
Author

torch 2.0.1
transformers 4.30.2
no state_dict in the file system

@yangheng95
Copy link
Owner

Can you try transformers=4.30.0?

@Kensvin28
Copy link
Author

Kensvin28 commented Jul 10, 2023

I tried transformers 4.30.0, but it still shows the same error.

[2023-07-10 15:21:36] (2.3.1) PyABSAVersion:2.3.1 --> Calling Count:1
[2023-07-10 15:21:36] (2.3.1) SRD:3 --> Calling Count:0
[2023-07-10 15:21:36] (2.3.1) TorchVersion:2.0.1+cu117+cuda11.7 --> Calling Count:1
[2023-07-10 15:21:36] (2.3.1) TransformersVersion:4.30.0 --> Calling Count:1
.
.
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

@Kensvin28
Copy link
Author

If I use the SAVE_MODEL_STATE_DICT mode, I can save the model too and infer later right? What is the difference with between SAVE_MODEL_STATE_DICT and SAVE_FULL_MODEL?

@yangheng95
Copy link
Owner

Please try to save state dict which can avoid many compatible errors with different transformers versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants