[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: init() got an unexpected keyword argument 'enable_recompute' #9026

liuzhipengchd · 2024-08-28T05:53:40Z

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.6.1 和 3.0 都试过了
- paddlenlp: 2.8和2.9 都试过了

当前版本
paddle-bfloat                     0.1.7
paddle2onnx                       1.2.7
paddlefsl                         1.1.0
paddlenlp                         2.8.1
paddleocr                         2.8.0
paddlepaddle-gpu                  2.6.1.post11

重复问题

I have searched the existing issues

错误描述

File "train_batch_neg.py", line 348, in do_train
    pretrained_model = AutoModel.from_pretrained(args.model_name_or_path, enable_recompute=args.use_recompute)
  File "/root/wxp/PaddleNLP/paddlenlp/transformers/auto/modeling.py", line 456, in from_pretrained
    return cls._from_pretrained(pretrained_model_name_or_path, task, *model_args, **kwargs)
  File "/root/wxp/PaddleNLP/paddlenlp/transformers/auto/modeling.py", line 320, in _from_pretrained
    return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
  File "/root/wxp/PaddleNLP/paddlenlp/transformers/model_utils.py", line 2306, in from_pretrained
    model = cls(config, *init_args, **model_kwargs)
  File "/root/wxp/PaddleNLP/paddlenlp/transformers/utils.py", line 280, in __impl__
    init_func(self, *args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'enable_recompute'
I0828 13:48:53.111547 13264 process_group_nccl.cc:132] ProcessGroupNCCL destruct 
I0828 13:48:53.161453 13362 tcp_store.cc:289] receive shutdown event and so quit from MasterDaemon run loop
LAUNCH INFO 2024-08-28 13:48:54,184 Exit code 1

稳定复现步骤 & 代码

'''执行命令
python3 -u -m paddle.distributed.launch --gpus "1,3"
train_batch_neg.py
--device gpu
--save_dir ./checkpoints_medicine/
--batch_size 64
--learning_rate 5E-5
--epochs 3
--output_emb_size 1024
--model_name_or_path ernie-3.0-base-zh
--save_steps 10
--max_seq_length 64
--margin 0.2
--train_set_file /root/train_data/medicine/train_supervised.csv
--recall_result_dir "recall_result_dir"
--recall_result_file "recall_result.txt"
--hnsw_m 100
--hnsw_ef 100
--recall_num 50
--similar_text_pair_file "/root/train_data/search/supervised/dev.csv"
--corpus_file "/root/train_data/search/supervised/corpus.csv"
'''

liuzhipengchd · 2024-08-28T07:05:24Z

PaddleNLP/paddlenlp/transformers/ernie/configuration.py

Line 1291 in 34a71c8

self.use_task_id = use_task_id

需要在这里定义 self.enable_recompute = enable_recompute，默认 enable_recompute=False

wawltor · 2024-08-28T11:43:29Z

ae02a3c 我们在这个commit id修复这个问题，考虑到部分模型没有办法使用recompute策略，我们禁用了recompute策略。

liuzhipengchd · 2024-08-29T06:18:07Z

ae02a3c 我们在这个commit id修复这个问题，考虑到部分模型没有办法使用recompute策略，我们禁用了recompute策略。

你好，我还想问个问题，在使用ranking/cross_encoder的时候，这个单塔的对于文本的先后顺序有点太敏感了。。同一对文本，改变先后顺序，计算的得分差异有点大。。有什么办法可以解决？（采用双塔可以吗）

wawltor · 2024-09-05T05:47:53Z

ae02a3c 我们在这个commit id修复这个问题，考虑到部分模型没有办法使用recompute策略，我们禁用了recompute策略。

你好，我还想问个问题，在使用ranking/cross_encoder的时候，这个单塔的对于文本的先后顺序有点太敏感了。。同一对文本，改变先后顺序，计算的得分差异有点大。。有什么办法可以解决？（采用双塔可以吗）

这个是由模型本身特性有关系，因为在模型训练过程中认为两个输入分别是query 和 document ，因此模型训练到的参数会有针对性差别；可以试试simces等双塔模型。

github-actions · 2024-11-05T00:20:20Z

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

liuzhipengchd added the bug Something isn't working label Aug 28, 2024

paddle-bot bot assigned KB-Ding Aug 28, 2024

github-actions bot added the stale label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: init() got an unexpected keyword argument 'enable_recompute' #9026

[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: init() got an unexpected keyword argument 'enable_recompute' #9026

liuzhipengchd commented Aug 28, 2024 •

edited

Loading

liuzhipengchd commented Aug 28, 2024

wawltor commented Aug 28, 2024

liuzhipengchd commented Aug 29, 2024 •

edited

Loading

wawltor commented Sep 5, 2024

github-actions bot commented Nov 5, 2024

[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: __init__() got an unexpected keyword argument 'enable_recompute' #9026

[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: __init__() got an unexpected keyword argument 'enable_recompute' #9026

Comments

liuzhipengchd commented Aug 28, 2024 • edited Loading

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

liuzhipengchd commented Aug 28, 2024

wawltor commented Aug 28, 2024

liuzhipengchd commented Aug 29, 2024 • edited Loading

wawltor commented Sep 5, 2024

github-actions bot commented Nov 5, 2024

[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: init() got an unexpected keyword argument 'enable_recompute' #9026

[Bug]: 使用neural_search/recall/in_batch_negative 训练时候报错 TypeError: init() got an unexpected keyword argument 'enable_recompute' #9026

liuzhipengchd commented Aug 28, 2024 •

edited

Loading

liuzhipengchd commented Aug 29, 2024 •

edited

Loading