Model saving does not output state_dict #334

Kensvin28 · 2023-07-08T02:44:03Z

PyABSA Version (Required)

2.3.1

Code To Reproduce (Required)

from pyabsa import ModelSaveOption, DeviceTypeOption
import warnings

warnings.filterwarnings("ignore")

config.batch_size = 8
config.patience = 20
config.log_step = -1
config.max_seq_len = 256
config.seed = 1
config.verbose = False # If verbose == True, PyABSA will output the model structure and several processed data examples
config.notice = (
"This is an training example for aspect term extraction" # for memos usage
)

trainer = ASTE.ASTETrainer(
config=config,
dataset=dataset,
# from_checkpoint="english", # if you want to resume training from our pretrained checkpoints, you can pass the checkpoint name here
auto_device='cuda', # use cuda if available
checkpoint_save_mode=ModelSaveOption.SAVE_FULL_MODEL, # save state dict only instead of the whole model
load_aug=False, # there are some augmentation dataset for integrated datasets, you use them by setting load_aug=True to improve performance
)

Full Console Output (Required)

[2023-07-08 02:27:10] (2.3.1) Set Model Device: cuda
[2023-07-08 02:27:10] (2.3.1) Device Name: Tesla T4
2023-07-08 02:27:10,136 INFO: PyABSA version: 2.3.1
2023-07-08 02:27:10,137 INFO: Transformers version: 4.30.2
2023-07-08 02:27:10,138 INFO: Torch version: 2.0.1+cu117+cuda11.7
2023-07-08 02:27:10,138 INFO: Device: Tesla T4
2023-07-08 02:27:10,140 INFO: 407.Shopee in the trainer is not a exact path, will search dataset in current working directory
FindFile Warning --> multiple targets ['integrated_datasets/aste_datasets/407.Shopee', 'integrated_datasets/aste_datasets/407.Shopee/.ipynb_checkpoints'] found, only return the shortest path: <integrated_datasets/aste_datasets/407.Shopee>
2023-07-08 02:27:10,146 INFO: You can set load_aug=True in a trainer to augment your dataset (English only yet) and improve performance.
2023-07-08 02:27:11,753 INFO: Load dataset from integrated_datasets/aste_datasets/407.Shopee/train.txt
preparing dataloader: 2%|▏ | 10/523 [00:00<00:05, 96.70it/s]
EOL while scanning string literal (, line 1)
preparing dataloader: 100%|██████████| 523/523 [00:05<00:00, 98.53it/s]
2023-07-08 02:27:18,110 INFO: Load dataset from integrated_datasets/aste_datasets/407.Shopee/test.txt
preparing dataloader: 51%|█████▏ | 54/105 [00:00<00:00, 97.42it/s]
EOL while scanning string literal (, line 1)
preparing dataloader: 100%|██████████| 105/105 [00:01<00:00, 92.39it/s]
2023-07-08 02:27:19,812 INFO: Load dataset from integrated_datasets/aste_datasets/407.Shopee/dev.txt
preparing dataloader: 100%|██████████| 71/71 [00:00<00:00, 100.00it/s]
building vocab...
converting data to features: 100%|██████████| 522/522 [00:31<00:00, 16.66it/s]
converting data to features: 100%|██████████| 104/104 [00:07<00:00, 14.29it/s]
converting data to features: 100%|██████████| 71/71 [00:03<00:00, 20.75it/s]
2023-07-08 02:28:02,765 INFO: Save cache dataset to emcgcn.407.Shopee.dataset.b58ef8d99282bf35c7523e9d4fe3c00be3acbf79e1c910c9b38732fded1e3432.cache

Some weights of the model checkpoint at yangheng/deberta-v3-base-absa-v1.1 were not used when initializing DebertaV2Model: ['pooler.dense.bias', 'classifier.bias', 'classifier.weight', 'pooler.dense.weight']

This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[2023-07-08 02:28:21] (2.3.1) ABSADatasetsVersion:None --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) MV:<metric_visualizer.metric_visualizer.MetricVisualizer object at 0x7fbe3367b9d0> --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) PyABSAVersion:2.3.1 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) SRD:3 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) TorchVersion:2.0.1+cu117+cuda11.7 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) TransformersVersion:4.30.2 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) adam_epsilon:1e-08 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) auto_device:cuda --> Calling Count:2
[2023-07-08 02:28:21] (2.3.1) batch_size:8 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) cache_dataset:True --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) checkpoint_save_mode:2 --> Calling Count:4
[2023-07-08 02:28:21] (2.3.1) cross_validate_fold:-1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) dataset_file:{'train': ['integrated_datasets/aste_datasets/407.Shopee/train.txt'], 'test': ['integrated_datasets/aste_datasets/407.Shopee/test.txt'], 'valid': ['integrated_datasets/aste_datasets/407.Shopee/dev.txt']} --> Calling Count:17
[2023-07-08 02:28:21] (2.3.1) dataset_name:407.Shopee --> Calling Count:3
[2023-07-08 02:28:21] (2.3.1) dca_layer:3 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) dca_p:1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) deep_ensemble:False --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) deprel_size:47 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) deprel_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c6520> --> Calling Count:697
[2023-07-08 02:28:21] (2.3.1) device:cuda --> Calling Count:2
[2023-07-08 02:28:21] (2.3.1) device_name:Tesla T4 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) dlcf_a:2 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) dropout:0.5 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) dynamic_truncate:True --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) emb_dropout:0.5 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) embed_dim:768 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) epochs:100 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) eta:1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) eta_lr:0.1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) evaluate_begin:0 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) from_checkpoint:None --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) gcn_dim:300 --> Calling Count:6
[2023-07-08 02:28:21] (2.3.1) hidden_dim:768 --> Calling Count:4
[2023-07-08 02:28:21] (2.3.1) index_to_label:OrderedDict([(0, 'N'), (1, 'B-A'), (2, 'I-A'), (3, 'A'), (4, 'B-O'), (5, 'I-O'), (6, 'O'), (7, 'Negative'), (8, 'Neutral'), (9, 'Positive')]) --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) inference_model:None --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) initializer:xavier_uniform_ --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) l2reg:1e-06 --> Calling Count:3
[2023-07-08 02:28:21] (2.3.1) label_to_index:OrderedDict([('N', 0), ('B-A', 1), ('I-A', 2), ('A', 3), ('B-O', 4), ('I-O', 5), ('O', 6), ('Negative', 7), ('Neutral', 8), ('Positive', 9)]) --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) lcf:cdw --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) learning_rate:2e-05 --> Calling Count:3
[2023-07-08 02:28:21] (2.3.1) load_aug:False --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) log_step:-1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) logger:<Logger emcgcn (INFO)> --> Calling Count:10
[2023-07-08 02:28:21] (2.3.1) lsa:False --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) max_seq_len:256 --> Calling Count:34193
[2023-07-08 02:28:21] (2.3.1) model:<class 'pyabsa.tasks.AspectSentimentTripletExtraction.models.model.EMCGCN'> --> Calling Count:5
[2023-07-08 02:28:21] (2.3.1) model_name:emcgcn --> Calling Count:2
[2023-07-08 02:28:21] (2.3.1) model_path_to_save:checkpoints --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) notice:This is an training example for aspect term extraction --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) num_epoch:10 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) num_layers:1 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) optimizer:adamw --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) output_dim:10 --> Calling Count:7
[2023-07-08 02:28:21] (2.3.1) overwrite_cache:False --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) path_to_save:None --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) patience:20 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) pooling:avg --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) post_size:206 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) post_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c6ca0> --> Calling Count:697
[2023-07-08 02:28:21] (2.3.1) postag_size:155 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) postag_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c68b0> --> Calling Count:697
[2023-07-08 02:28:21] (2.3.1) pretrained_bert:yangheng/deberta-v3-base-absa-v1.1 --> Calling Count:5
[2023-07-08 02:28:21] (2.3.1) relation_constraint:True --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) save_mode:2 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) seed:1 --> Calling Count:7
[2023-07-08 02:28:21] (2.3.1) sigma:0.3 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) similarity_threshold:1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) spacy_model:en_core_web_sm --> Calling Count:5
[2023-07-08 02:28:21] (2.3.1) srd_alignment:True --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) symmetry_decoding:False --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) syn_post_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c62e0> --> Calling Count:699
[2023-07-08 02:28:21] (2.3.1) synpost_size:7 --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) task:triplet --> Calling Count:39406
[2023-07-08 02:28:21] (2.3.1) task_code:ASTE --> Calling Count:2
[2023-07-08 02:28:21] (2.3.1) task_name:Aspect Sentiment Triple Extraction --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) token_vocab:<pyabsa.tasks.AspectSentimentTripletExtraction.dataset_utils.aste_utils.VocabHelp object at 0x7fbe140c62b0> --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) tokenizer:<pyabsa.framework.tokenizer_class.tokenizer_class.PretrainedTokenizer object at 0x7fbe3367bdf0> --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) use_amp:False --> Calling Count:1
[2023-07-08 02:28:21] (2.3.1) use_bert_spc:True --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) use_syntax_based_SRD:False --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) verbose:False --> Calling Count:3
[2023-07-08 02:28:21] (2.3.1) warmup_step:-1 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) weight_decay:0.0 --> Calling Count:0
[2023-07-08 02:28:21] (2.3.1) window:lr --> Calling Count:0
2023-07-08 02:28:21,313 INFO: ***** Running training for Aspect Sentiment Triple Extraction *****
2023-07-08 02:28:21,314 INFO: Training set examples = 522
2023-07-08 02:28:21,315 INFO: Valid set examples = 71
2023-07-08 02:28:21,315 INFO: Test set examples = 104
2023-07-08 02:28:21,316 INFO: Total params = 185533758, Trainable params = 185533758, Non-trainable params = 0
2023-07-08 02:28:21,317 INFO: Batch size = 8
2023-07-08 02:28:21,318 INFO: Num steps = 660
Epoch: 0 | Smooth Loss: 0.6131: 100%|██████████| 66/66 [00:53<00:00, 1.22it/s, Dev F1:0.00(max:0.00)]
Epoch: 1 | Smooth Loss: 0.5229: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:30.61(max:30.61)]
Epoch: 2 | Smooth Loss: 0.4670: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:38.51(max:38.51)]
Epoch: 3 | Smooth Loss: 0.4326: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:39.89(max:39.89)]
Epoch: 4 | Smooth Loss: 0.4026: 100%|██████████| 66/66 [00:52<00:00, 1.27it/s, Dev F1:42.65(max:42.65)]
Epoch: 5 | Smooth Loss: 0.3777: 100%|██████████| 66/66 [00:50<00:00, 1.30it/s, Dev F1:40.60(max:42.65)]
Epoch: 6 | Smooth Loss: 0.3556: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:45.18(max:45.18)]
Epoch: 7 | Smooth Loss: 0.3350: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:47.93(max:47.93)]
Epoch: 8 | Smooth Loss: 0.3167: 100%|██████████| 66/66 [00:52<00:00, 1.26it/s, Dev F1:51.47(max:51.47)]
Epoch: 9 | Smooth Loss: 0.3003: 100%|██████████| 66/66 [00:50<00:00, 1.31it/s, Dev F1:50.28(max:51.47)]
[2023-07-08 02:37:02] (2.3.1) Loading best model: checkpoints/emcgcn_407.Shopee_f1_51.47/ and evaluating on test set

AttributeError Traceback (most recent call last)
~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in _check_seekable(f)
353 try:
--> 354 f.seek(f.tell())
355 return True

AttributeError: 'NoneType' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
/tmp/ipykernel_469/2016644588.py in <cell line: 16>()
14 )
15
---> 16 trainer = ASTE.ASTETrainer(
17 config=config,
18 dataset=dataset,

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/trainer/trainer.py in init(self, config, dataset, from_checkpoint, checkpoint_save_mode, auto_device, path_to_save, load_aug)
65 self.config.task_name = TaskNameOption().get(self.config.task_code)
66
---> 67 self._run()

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/framework/trainer_class/trainer_template.py in _run(self)
239 self.config.seed = s
240 if self.config.checkpoint_save_mode:
--> 241 model_path.append(self.training_instructor(self.config).run())
242 else:
243 # always return the last trained model if you don't save trained model

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/instructor/instructor.py in run(self)
869 # Loss and Optimizer
870 criterion = nn.CrossEntropyLoss(ignore_index=-1)
--> 871 return self._train(criterion)
872
873 def _train(self, criterion):

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/instructor/instructor.py in _train(self, criterion)
884 return self._k_fold_train_and_evaluate(criterion)
885 else:
--> 886 return self._train_and_evaluate(criterion)

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/tasks/AspectSentimentTripletExtraction/instructor/instructor.py in _train_and_evaluate(self, criterion)
441 "Loading best model: {} and evaluating on test set ".format(save_path)
442 )
--> 443 self._reload_model_state_dict(save_path)
444 joint_precision, joint_recall, joint_f1 = self._evaluate_f1(
445 self.test_dataloader

~/.conda/envs/default/lib/python3.9/site-packages/pyabsa/framework/instructor_class/instructor_template.py in _reload_model_state_dict(self, ckpt)
119 else:
120 self.model.load_state_dict(
--> 121 torch.load(find_file(ckpt, or_key=[".bin", "state_dict"]))
122 )
123

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in load(f, map_location, pickle_module, weights_only, **pickle_load_args)
789 pickle_load_args['encoding'] = 'utf-8'
790
--> 791 with _open_file_like(f, 'rb') as opened_file:
792 if _is_zipfile(opened_file):
793 # The zipfile reader is going to advance the current file position.

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in _open_file_like(name_or_buffer, mode)
274 return _open_buffer_writer(name_or_buffer)
275 elif 'r' in mode:
--> 276 return _open_buffer_reader(name_or_buffer)
277 else:
278 raise RuntimeError(f"Expected 'r' or 'w' in mode but got {mode}")

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in init(self, buffer)
259 def init(self, buffer):
260 super().init(buffer)
--> 261 _check_seekable(buffer)
262
263

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in _check_seekable(f)
355 return True
356 except (io.UnsupportedOperation, AttributeError) as e:
--> 357 raise_err_msg(["seek", "tell"], e)
358 return False
359

~/.conda/envs/default/lib/python3.9/site-packages/torch/serialization.py in raise_err_msg(patterns, e)
348 + " Please pre-load the data into a buffer like io.BytesIO and"
349 + " try to load from it instead.")
--> 350 raise type(e)(msg)
351 raise e
352

AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

Describe the bug

I don't know why, but the model does not save the state_dict file now, so every time it is testing after training, it shows this error.

Expected behavior

Should be able to get state_dict when saving trained model and execute the testing without error.

yangheng95 · 2023-07-10T11:44:16Z

Could you please check your torch version and transformers version? And can you check if the .state_dict exists in your file system?

Kensvin28 · 2023-07-10T14:23:30Z

torch 2.0.1
transformers 4.30.2
no state_dict in the file system

yangheng95 · 2023-07-10T14:40:43Z

Can you try transformers=4.30.0？

Kensvin28 · 2023-07-10T15:31:41Z

I tried transformers 4.30.0, but it still shows the same error.

[2023-07-10 15:21:36] (2.3.1) PyABSAVersion:2.3.1 --> Calling Count:1
[2023-07-10 15:21:36] (2.3.1) SRD:3 --> Calling Count:0
[2023-07-10 15:21:36] (2.3.1) TorchVersion:2.0.1+cu117+cuda11.7 --> Calling Count:1
[2023-07-10 15:21:36] (2.3.1) TransformersVersion:4.30.0 --> Calling Count:1
.
.
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

Kensvin28 · 2023-07-11T14:02:53Z

If I use the SAVE_MODEL_STATE_DICT mode, I can save the model too and infer later right? What is the difference with between SAVE_MODEL_STATE_DICT and SAVE_FULL_MODEL?

yangheng95 · 2023-07-15T12:09:10Z

Please try to save state dict which can avoid many compatible errors with different transformers versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model saving does not output state_dict #334

Model saving does not output state_dict #334

Kensvin28 commented Jul 8, 2023 •

edited

Loading

yangheng95 commented Jul 10, 2023

Kensvin28 commented Jul 10, 2023

yangheng95 commented Jul 10, 2023

Kensvin28 commented Jul 10, 2023 •

edited

Loading

Kensvin28 commented Jul 11, 2023

yangheng95 commented Jul 15, 2023

Model saving does not output state_dict #334

Model saving does not output state_dict #334

Comments

Kensvin28 commented Jul 8, 2023 • edited Loading

PyABSA Version (Required)

Code To Reproduce (Required)

Full Console Output (Required)

Describe the bug

Expected behavior

yangheng95 commented Jul 10, 2023

Kensvin28 commented Jul 10, 2023

yangheng95 commented Jul 10, 2023

Kensvin28 commented Jul 10, 2023 • edited Loading

Kensvin28 commented Jul 11, 2023

yangheng95 commented Jul 15, 2023

Kensvin28 commented Jul 8, 2023 •

edited

Loading

Kensvin28 commented Jul 10, 2023 •

edited

Loading