Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在aistudio平台上使用paddlepaddle python环境训练,通过paddle的go api进行推理, 报错详细间日志。 #58652

Closed
hubimaso opened this issue Nov 3, 2023 · 9 comments
Labels
status/close 已关闭 status/reopen 重新打开 type/debug 帮用户debug

Comments

@hubimaso
Copy link

hubimaso commented Nov 3, 2023

bug描述 Describe the Bug

1)训练过程:paddlepaddle 的版本2.5最新版本,paddleNLP 2.6版本,在AIstudio平台参考paddleNLP中的text_classification的训练,使用的模型是bert-base-chinese,训练完成后,保存了模型文件如下:
image
2)推理过程:使用go api 加载 model.pdiparams和 model.pdmodel文件,在cpu环境下进行推理,编译成二进制后,运行后报如下错误:
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what():

Compile Traceback (most recent call last):
File "/home/aistudio/PaddleNLP/applications/text_classification/multi_class/train.py", line 231, in
main()
File "/home/aistudio/PaddleNLP/applications/text_classification/multi_class/train.py", line 208, in main
export_model(model=trainer.model, input_spec=input_spec, path=model_args.export_model_dir)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/export.py", line 59, in export_model
paddle.jit.save(model, save_path)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/api.py", line 752, in wrapper
func(layer, path, input_spec, **configs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 75, in impl
return func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/api.py", line 1043, in save
static_func.concrete_program_specify_input_spec(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/program_translator.py", line 709, in concrete_program_specify_input_spec
concrete_program, _ = self.get_concrete_program(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/program_translator.py", line 589, in get_concrete_program
concrete_program, partial_program_layer = self._program_cache[
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/program_translator.py", line 1249, in getitem
self._caches[item_id] = self._build_once(item)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/program_translator.py", line 1193, in _build_once
concrete_program = ConcreteProgram.from_func_spec(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in impl
return wrapped_func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/dygraph/base.py", line 75, in impl
return func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/program_translator.py", line 1063, in from_func_spec
outputs = static_func(*inputs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/bert/modeling.py", line 706, in forward
outputs = self.bert(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1256, in call
return self._dygraph_call_func(*inputs, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1235, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/bert/modeling.py", line 425, in forward
if attention_mask is None:
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/convert_operators.py", line 352, in convert_ifelse
out = _run_py_ifelse(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/jit/dy2static/convert_operators.py", line 429, in _run_py_ifelse
py_outs = true_fn() if pred else false_fn()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/bert/modeling.py", line 426, in forward
attention_mask = paddle.unsqueeze(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/layers/math_op_patch.py", line 445, in impl
current_block(self).append_op(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/framework.py", line 4013, in append_op
op = Operator(
File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/fluid/framework.py", line 2781, in init
for frame in traceback.extract_stack():


C++ Traceback (most recent call last):

0 PD_PredictorRun
1 paddle::AnalysisPredictor::ZeroCopyRun()
2 paddle::framework::NaiveExecutor::Run()
3 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, phi::Place const&)
4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&) const
5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&, paddle::framework::RuntimeContext*) const
6 std::_Function_handler<void (paddle::framework::InferShapeContext*), paddle::framework::details::OpInfoFiller<equal_InferShapeFunctor, (paddle::framework::details::OpInfoFillType)4>::operator()(char const*, paddle::framework::OpInfo*) const::{lambda(paddle::framework::InferShapeContext*)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::InferShapeContext*&&)
7 equal_InferShapeFunctor::operator()(paddle::framework::InferShapeContext*) const
8 phi::CompareInferMeta(phi::MetaTensor const&, phi::MetaTensor const&, int, phi::MetaTensor*)
9 phi::funcs::GetBroadcastDimsArrays(phi::DDim const&, phi::DDim const&, int*, int*, int*, int, int)
10 phi::enforce::EnforceNotMet::EnforceNotMet(phi::ErrorSummary const&, char const*, int)
11 phi::enforce::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

InvalidArgumentError: Axis should be less than 2, but received axis is 2.
[Hint: Expected axis < max_dim, but received axis:2 >= max_dim:2.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:53)
[operator < equal > error]
SIGABRT: abort
PC=0x7f7d19dcc387 m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 1 [syscall]:
runtime.cgocall(0x7b64a0, 0xc000071958)
/opt/go1.18/src/runtime/cgocall.go:157 +0x5c fp=0xc000071918 sp=0xc0000718e0 pc=0x409cfc
ds_nlpcls/paddle._Cfunc_PD_PredictorRun(0x1bb1ac0)
_cgo_gotypes.go:1353 +0x49 fp=0xc000071958 sp=0xc000071918 pc=0x70e369
ds_nlpcls/paddle.(*Predictor).Run.func1(0xc000266410)
/home/zhouzh/ds_nlpcls/paddle/predictor.go:145 +0x6d fp=0xc0000719b0 sp=0xc000071958 pc=0x710ead
ds_nlpcls/paddle.(*Predictor).Run(0xc000266410)
/home/zhouzh/ds_nlpcls/paddle/predictor.go:145 +0x1e fp=0xc0000719d8 sp=0xc0000719b0 pc=0x710e1e
main.predict({0x83f621, 0x22d7})
/home/zhouzh/ds_nlpcls/main.go:249 +0x1c6 fp=0xc000071ba8 sp=0xc0000719d8 pc=0x7b4546
main.main()
/home/zhouzh/ds_nlpcls/main.go:227 +0x5ee fp=0xc000071f80 sp=0xc000071ba8 pc=0x7b414e
runtime.main()
/opt/go1.18/src/runtime/proc.go:250 +0x1d8 fp=0xc000071fe0 sp=0xc000071f80 pc=0x4417f8
runtime.goexit()
/opt/go1.18/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000071fe8 sp=0xc000071fe0 pc=0x46f281

rax 0x0
rbx 0x28fc2d0
rcx 0xffffffffffffffff
rdx 0x6
rdi 0x37f9
rsi 0x37f9
rbp 0x7f7d1a15e868
rsp 0x7ffefca7aa18
r8 0x7f7d20eb8010
r9 0x7f7d20ec1700
r10 0x8
r11 0x206
r12 0x28f6ad0
r13 0x1
r14 0x7ffefca7ad20
r15 0x1
rip 0x7f7d19dcc387
rflags 0x206
cs 0x33
fs 0x0
gs 0x0

3)疑问:为何底层C++ 会调用我训练过程中的python环境,这个地方报错我该如何定位和解决,没找到新相关的解决方法,麻烦帮忙看看,非常感谢!

其他补充信息 Additional Supplementary Information

No response

@hubimaso
Copy link
Author

hubimaso commented Nov 3, 2023

把 paddle_inference_c C预测库更新到最新,同样有此错误,是不是哪里用法不对,有没有大神指导下 @jzhang533 @wanglun @lileding

@6clc
Copy link
Contributor

6clc commented Nov 4, 2023

可以提供一下你的复现流程和代码吗?流程涉及的内容比较多,光看log不好定位。

@hubimaso
Copy link
Author

hubimaso commented Nov 4, 2023

可以提供一下你的复现流程和代码吗?流程涉及的内容比较多,光看log不好定位。
1)训练的环境平台AIStudio, python的代码例子,是paddleNLP里面application中的text_classification的中的train,做了比较小的改动
如下:
def main():
"""
Training a binary or multi classification model
"""

parser = PdArgumentParser((ModelArguments, DataArguments, CompressionArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
if training_args.do_compress:
    training_args.strategy = "dynabert"
if training_args.do_train or training_args.do_compress:
    training_args.print_config(model_args, "Model")
    training_args.print_config(data_args, "Data")
paddle.set_device(training_args.device)

# Define id2label
id2label = {}
label2id = {}
with open(data_args.label_path, "r", encoding="utf-8") as f:
    for i, line in enumerate(f):
        l = line.strip()
        id2label[i] = l
        label2id[l] = i

# Define model & tokenizer
if os.path.isdir(model_args.model_name_or_path):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path, label2id=label2id, id2label=id2label
    )
elif model_args.model_name_or_path in SUPPORTED_MODELS:
    model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path, num_classes=len(label2id), label2id=label2id, id2label=id2label
    )
else:
    raise ValueError(
        f"{model_args.model_name_or_path} is not a supported model type. Either use a local model path or select a model from {SUPPORTED_MODELS}"
    )
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path)

# load and preprocess dataset
train_ds = load_dataset(read_local_dataset2, path=data_args.train_path, label2id=label2id, lazy=False)
dev_ds = load_dataset(read_local_dataset2, path=data_args.dev_path, label2id=label2id, lazy=False)
trans_func = functools.partial(preprocess_function, tokenizer=tokenizer, max_length=data_args.max_length)
trans_func_test = functools.partial(preprocess_function_test, tokenizer=tokenizer, max_length=data_args.max_length)
train_ds = train_ds.map(trans_func)
dev_ds = dev_ds.map(trans_func)

if data_args.debug:
    test_ds = load_dataset(read_local_dataset_test, path=data_args.test_path, label2id=label2id, lazy=False)
    test_ds = test_ds.map(trans_func_test)

# Define the metric function.
def compute_metrics(eval_preds):
    pred_ids = np.argmax(eval_preds.predictions, axis=-1)
    metrics = {}
    metrics["accuracy"] = accuracy_score(y_true=eval_preds.label_ids, y_pred=pred_ids)
    for average in ["micro", "macro"]:
        precision, recall, f1, _ = precision_recall_fscore_support(
            y_true=eval_preds.label_ids, y_pred=pred_ids, average=average
        )
        metrics[f"{average}_precision"] = precision
        metrics[f"{average}_recall"] = recall
        metrics[f"{average}_f1"] = f1
    return metrics

def compute_metrics_debug(eval_preds):
    pred_ids = np.argmax(eval_preds.predictions, axis=-1)
    metrics = classification_report(eval_preds.label_ids, pred_ids, output_dict=True)
    return metrics

# Define the early-stopping callback.
if data_args.early_stopping:
    callbacks = [EarlyStoppingCallback(early_stopping_patience=data_args.early_stopping_patience)]
else:
    callbacks = None

# Define Trainer
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    criterion=paddle.nn.loss.CrossEntropyLoss(),
    train_dataset=train_ds,
    eval_dataset=dev_ds,
    callbacks=callbacks,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics_debug if data_args.debug else compute_metrics,
)

# Training
if training_args.do_train:
    train_result = trainer.train()
    metrics = train_result.metrics
    trainer.save_model()
    trainer.log_metrics("train", metrics)
    for checkpoint_path in Path(training_args.output_dir).glob("checkpoint-*"):
        shutil.rmtree(checkpoint_path)

# Evaluate and tests model
if training_args.do_eval:
    if data_args.debug:
        output = trainer.predict(test_ds)
        log_metrics_debug(output, id2label, test_ds, data_args.bad_case_path)
    else:
        eval_metrics = trainer.evaluate()
        trainer.log_metrics("eval", eval_metrics)

# export inference model
if training_args.do_export:
    if model.init_config["init_class"] in ["ErnieMForSequenceClassification"]:
        # input_spec = [paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids")]
        input_spec = [paddle.static.InputSpec(shape=[1, data_args.max_length], dtype="int64", name="input_ids")]
    else:
        input_spec = [
            # paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids"),
            # paddle.static.InputSpec(shape=[None, None], dtype="int64", name="token_type_ids"),
            paddle.static.InputSpec(shape=[1, data_args.max_length], dtype="int64", name="input_ids"),
            paddle.static.InputSpec(shape=[1, data_args.max_length], dtype="int64", name="token_type_ids"),
        ]
    if model_args.export_model_dir is None:
        model_args.export_model_dir = os.path.join(training_args.output_dir, "export")
    export_model(model=trainer.model, input_spec=input_spec, path=model_args.export_model_dir)
    tokenizer.save_pretrained(model_args.export_model_dir)
    id2label_file = os.path.join(model_args.export_model_dir, "id2label.json")
    with open(id2label_file, "w", encoding="utf-8") as f:
        json.dump(id2label, f, ensure_ascii=False)
        logger.info(f"id2label file saved in {id2label_file}")

# compress
if training_args.do_compress:
    trainer.compress()
    for width_mult in training_args.width_mult_list:
        pruned_infer_model_dir = os.path.join(training_args.output_dir, "width_mult_" + str(round(width_mult, 2)))
        tokenizer.save_pretrained(pruned_infer_model_dir)
        id2label_file = os.path.join(pruned_infer_model_dir, "id2label.json")
        with open(id2label_file, "w", encoding="utf-8") as f:
            json.dump(id2label, f, ensure_ascii=False)
            logger.info(f"id2label file saved in {id2label_file}")

for path in Path(training_args.output_dir).glob("runs"):
    shutil.rmtree(path)

2)推理go代码主要流程如下:
func getBert(vocabFile string) (retVal *tokenizer.Tokenizer) {
// vocabFile, err := util.CachedPath("bert-base-uncased", "vocab.txt")
// if err != nil {
// panic(err)
// }

model, err := wordpiece.NewWordPieceFromFile(vocabFile, "[UNK]")
if err != nil {
	log.Fatal(err)
}

tk := tokenizer.NewTokenizer(model)
fmt.Printf("Vocab size: %v\n", tk.GetVocabSize(false))

bertNormalizer := normalizer.NewBertNormalizer(true, true, true, true)
tk.WithNormalizer(bertNormalizer)

bertPreTokenizer := pretokenizer.NewBertPreTokenizer()
tk.WithPreTokenizer(bertPreTokenizer)

var specialTokens []tokenizer.AddedToken
specialTokens = append(specialTokens, tokenizer.NewAddedToken("[MASK]", true))

tk.AddSpecialTokens(specialTokens)

maxLen := *maxLen

truncParams := tokenizer.TruncationParams{
	MaxLength: maxLen,
	Strategy:  tokenizer.OnlyFirst,
	Stride:    128,
}
tk.WithTruncation(&truncParams)

sepId, ok := tk.TokenToId("[SEP]")
if !ok {
	log.Fatalf("Cannot find ID for [SEP] token.\n")
}
sep := processor.PostToken{Id: sepId, Value: "[SEP]"}

clsId, ok := tk.TokenToId("[CLS]")
if !ok {
	log.Fatalf("Cannot find ID for [CLS] token.\n")
}
cls := processor.PostToken{Id: clsId, Value: "[CLS]"}

postProcess := processor.NewBertProcessing(sep, cls)
tk.WithPostProcessor(postProcess)

return tk

}

func load_content(content string, tk *tokenizer.Tokenizer) []float32 {
// Define input
var input []tokenizer.EncodeInput
input = append(input, tokenizer.NewSingleEncodeInput(tokenizer.NewInputSequence(content)))
encodings, err := tk.EncodeBatch(input, true)
if err != nil {
log.Fatal(err)
}

// Find max length of token Ids from slice of encodings
var maxLen int = 0
for _, en := range encodings {
	if len(en.Ids) > maxLen {
		maxLen = len(en.Ids)
	}
}

fmt.Printf("encodings: %v\n", encodings)
var tokInput []float32 = make([]float32, maxLen)
for _, en := range encodings {
	for i := 0; i < len(en.Ids); i++ {
		tokInput[i] = float32(en.Ids[i])
	}
}
return tokInput

}

func initLableMap() {
// 打开 id2label.json 文件
file, err := os.Open(*lableMapFile)
if err != nil {
fmt.Println("Error opening file:", err)
return
}
defer file.Close()

// 读取文件内容
content, err := ioutil.ReadAll(file)
if err != nil {
	fmt.Println("Error reading file:", err)
	return
}

// 创建 map[int]int
lableMap := make(map[int32]int32)

// 解码 JSON 数据
jsonData := make(map[string]string)
err = json.Unmarshal(content, &jsonData)
if err != nil {
	fmt.Println("Error decoding JSON:", err)
	return
}

// 将字符串键和值转换为 int 类型,并存入新的 map[int]int 中
for keyStr, valueStr := range jsonData {
	key, err := strconv.Atoi(keyStr)
	if err != nil {
		fmt.Println("Error converting key to int:", err)
		return
	}

	value, err := strconv.Atoi(valueStr)
	if err != nil {
		fmt.Println("Error converting value to int:", err)
		return
	}

	lableMap[int32(key)] = int32(value)
}

// 打印加载后的 map[int]int
for key, value := range lableMap {
	fmt.Println(key, ":", value)
}

}

var inTensor *pd.Tensor
var outTensor *pd.Tensor
var mainPredictor *pd.Predictor
var wg sync.WaitGroup
var tk *tokenizer.Tokenizer

func main() {
flag.Parse()
tk = getBert(*vocabFile)
initLableMap()
config := pd.NewConfig()
config.SetModel(*modelName, *paramsName)
if *useGpu {
config.EnableUseGpu(100, int32(*gpuId))
// if *useTrt {
// config.EnableTensorRtEngine(1<<30, 16, 3, pd.PrecisionFloat32, false, false)
// if *useTrtDynamicShape {
// minInputShape := make(map[string][]int32)
// minInputShape["inputs"] = []int32{int32(*batchSize), 3, 100, 100}
// maxInputShape := make(map[string][]int32)
// maxInputShape["inputs"] = []int32{int32(*batchSize), 3, 608, 608}
// optInputShape := make(map[string][]int32)
// optInputShape["inputs"] = []int32{int32(*batchSize), 3, 224, 224}
// config.SetTRTDynamicShapeInfo(minInputShape, maxInputShape, optInputShape, false)
// }
// }
} else {
config.SetCpuMathLibraryNumThreads(*cpuMath)
}
mainPredictor = pd.NewPredictor(config)
inNames := mainPredictor.GetInputNames()
outNames := mainPredictor.GetOutputNames()

// log.Println("inNames:", inNames)
// log.Println("outNames:", outNames)
println("input num: ", mainPredictor.GetInputNum())
println("input name: ", mainPredictor.GetInputNames()[0])
println("output num: ", mainPredictor.GetOutputNum())
println("output name: ", mainPredictor.GetOutputNames()[0])

var inHandles = make(map[string]*pd.Tensor)
var outHandles = make(map[string]*pd.Tensor)
for _, n := range inNames {
	inHandles[n] = mainPredictor.GetInputHandle(n)
}
for _, n := range outNames {
	outHandles[n] = mainPredictor.GetOutputHandle(n)
}

inTensor = inHandles[inNames[0]]
outTensor = outHandles[outNames[0]]

// test code
predict(testString)

startRpcServer()

log.Println("exit")
wg.Wait()

}

func predict(content string) int32 {
wg.Add(1)
defer wg.Done()

start := time.Now()

data := load_content(content, tk)
if data == nil {
	return -1
}

inTensor.Reshape([]int32{1, int32(*maxLen)})
inTensor.CopyFromCpu(data)

mainPredictor.Run()

outData := make([]float32, numElements(outTensor.Shape()))
outTensor.CopyToCpu(outData)
tim := time.Now().Sub(start)

result := maxValue(outData)
// result_str := ""
// if int(result) < len(labels) {
// 	result_str = labels[result]
// }

log.Printf("out max val: %d(%d) ,time: %v\n", result, lableMap[result], tim)
return result

}

func numElements(shape []int32) int32 {
n := int32(1)
for _, v := range shape {
n *= v
}
return n
}

func maxValue(vals []float32) (max_index int32) {
var max float32 = 0
max_index = 0
for index, v := range vals {
if v > max {
max = v
max_index = int32(index)
}
}
return max_index
}

疑问:就是go api中的reshape函数这里我填写的是[1,512],(报错应该跟这个没有关系), 因为我训练比较简单,就是通过输入content字符串截断512字节,label就是多个分类,比如1,2,3,4等。然后官方给的go推理demo是图像相关的,按照官方给的图像是没问题,paddleNlp的模型就有问题,nlp底层我感觉还是走的transfomer那一套。不知道大神是否用过paddlenlp的go推理。

@6clc
Copy link
Contributor

6clc commented Nov 4, 2023

可以提供一下你的复现流程和代码吗?流程涉及的内容比较多,光看log不好定位。
1)训练的环境平台AIStudio, python的代码例子,是paddleNLP里面application中的text_classification的中的train,做了比较小的改动
如下:
def main():
"""
Training a binary or multi classification model
"""

parser = PdArgumentParser((ModelArguments, DataArguments, CompressionArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
if training_args.do_compress:
    training_args.strategy = "dynabert"
if training_args.do_train or training_args.do_compress:
    training_args.print_config(model_args, "Model")
    training_args.print_config(data_args, "Data")
paddle.set_device(training_args.device)

# Define id2label
id2label = {}
label2id = {}
with open(data_args.label_path, "r", encoding="utf-8") as f:
    for i, line in enumerate(f):
        l = line.strip()
        id2label[i] = l
        label2id[l] = i

# Define model & tokenizer
if os.path.isdir(model_args.model_name_or_path):
    model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path, label2id=label2id, id2label=id2label
    )
elif model_args.model_name_or_path in SUPPORTED_MODELS:
    model = AutoModelForSequenceClassification.from_pretrained(
        model_args.model_name_or_path, num_classes=len(label2id), label2id=label2id, id2label=id2label
    )
else:
    raise ValueError(
        f"{model_args.model_name_or_path} is not a supported model type. Either use a local model path or select a model from {SUPPORTED_MODELS}"
    )
tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path)

# load and preprocess dataset
train_ds = load_dataset(read_local_dataset2, path=data_args.train_path, label2id=label2id, lazy=False)
dev_ds = load_dataset(read_local_dataset2, path=data_args.dev_path, label2id=label2id, lazy=False)
trans_func = functools.partial(preprocess_function, tokenizer=tokenizer, max_length=data_args.max_length)
trans_func_test = functools.partial(preprocess_function_test, tokenizer=tokenizer, max_length=data_args.max_length)
train_ds = train_ds.map(trans_func)
dev_ds = dev_ds.map(trans_func)

if data_args.debug:
    test_ds = load_dataset(read_local_dataset_test, path=data_args.test_path, label2id=label2id, lazy=False)
    test_ds = test_ds.map(trans_func_test)

# Define the metric function.
def compute_metrics(eval_preds):
    pred_ids = np.argmax(eval_preds.predictions, axis=-1)
    metrics = {}
    metrics["accuracy"] = accuracy_score(y_true=eval_preds.label_ids, y_pred=pred_ids)
    for average in ["micro", "macro"]:
        precision, recall, f1, _ = precision_recall_fscore_support(
            y_true=eval_preds.label_ids, y_pred=pred_ids, average=average
        )
        metrics[f"{average}_precision"] = precision
        metrics[f"{average}_recall"] = recall
        metrics[f"{average}_f1"] = f1
    return metrics

def compute_metrics_debug(eval_preds):
    pred_ids = np.argmax(eval_preds.predictions, axis=-1)
    metrics = classification_report(eval_preds.label_ids, pred_ids, output_dict=True)
    return metrics

# Define the early-stopping callback.
if data_args.early_stopping:
    callbacks = [EarlyStoppingCallback(early_stopping_patience=data_args.early_stopping_patience)]
else:
    callbacks = None

# Define Trainer
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    criterion=paddle.nn.loss.CrossEntropyLoss(),
    train_dataset=train_ds,
    eval_dataset=dev_ds,
    callbacks=callbacks,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics_debug if data_args.debug else compute_metrics,
)

# Training
if training_args.do_train:
    train_result = trainer.train()
    metrics = train_result.metrics
    trainer.save_model()
    trainer.log_metrics("train", metrics)
    for checkpoint_path in Path(training_args.output_dir).glob("checkpoint-*"):
        shutil.rmtree(checkpoint_path)

# Evaluate and tests model
if training_args.do_eval:
    if data_args.debug:
        output = trainer.predict(test_ds)
        log_metrics_debug(output, id2label, test_ds, data_args.bad_case_path)
    else:
        eval_metrics = trainer.evaluate()
        trainer.log_metrics("eval", eval_metrics)

# export inference model
if training_args.do_export:
    if model.init_config["init_class"] in ["ErnieMForSequenceClassification"]:
        # input_spec = [paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids")]
        input_spec = [paddle.static.InputSpec(shape=[1, data_args.max_length], dtype="int64", name="input_ids")]
    else:
        input_spec = [
            # paddle.static.InputSpec(shape=[None, None], dtype="int64", name="input_ids"),
            # paddle.static.InputSpec(shape=[None, None], dtype="int64", name="token_type_ids"),
            paddle.static.InputSpec(shape=[1, data_args.max_length], dtype="int64", name="input_ids"),
            paddle.static.InputSpec(shape=[1, data_args.max_length], dtype="int64", name="token_type_ids"),
        ]
    if model_args.export_model_dir is None:
        model_args.export_model_dir = os.path.join(training_args.output_dir, "export")
    export_model(model=trainer.model, input_spec=input_spec, path=model_args.export_model_dir)
    tokenizer.save_pretrained(model_args.export_model_dir)
    id2label_file = os.path.join(model_args.export_model_dir, "id2label.json")
    with open(id2label_file, "w", encoding="utf-8") as f:
        json.dump(id2label, f, ensure_ascii=False)
        logger.info(f"id2label file saved in {id2label_file}")

# compress
if training_args.do_compress:
    trainer.compress()
    for width_mult in training_args.width_mult_list:
        pruned_infer_model_dir = os.path.join(training_args.output_dir, "width_mult_" + str(round(width_mult, 2)))
        tokenizer.save_pretrained(pruned_infer_model_dir)
        id2label_file = os.path.join(pruned_infer_model_dir, "id2label.json")
        with open(id2label_file, "w", encoding="utf-8") as f:
            json.dump(id2label, f, ensure_ascii=False)
            logger.info(f"id2label file saved in {id2label_file}")

for path in Path(training_args.output_dir).glob("runs"):
    shutil.rmtree(path)

2)推理go代码主要流程如下: func getBert(vocabFile string) (retVal *tokenizer.Tokenizer) { // vocabFile, err := util.CachedPath("bert-base-uncased", "vocab.txt") // if err != nil { // panic(err) // }

model, err := wordpiece.NewWordPieceFromFile(vocabFile, "[UNK]")
if err != nil {
	log.Fatal(err)
}

tk := tokenizer.NewTokenizer(model)
fmt.Printf("Vocab size: %v\n", tk.GetVocabSize(false))

bertNormalizer := normalizer.NewBertNormalizer(true, true, true, true)
tk.WithNormalizer(bertNormalizer)

bertPreTokenizer := pretokenizer.NewBertPreTokenizer()
tk.WithPreTokenizer(bertPreTokenizer)

var specialTokens []tokenizer.AddedToken
specialTokens = append(specialTokens, tokenizer.NewAddedToken("[MASK]", true))

tk.AddSpecialTokens(specialTokens)

maxLen := *maxLen

truncParams := tokenizer.TruncationParams{
	MaxLength: maxLen,
	Strategy:  tokenizer.OnlyFirst,
	Stride:    128,
}
tk.WithTruncation(&truncParams)

sepId, ok := tk.TokenToId("[SEP]")
if !ok {
	log.Fatalf("Cannot find ID for [SEP] token.\n")
}
sep := processor.PostToken{Id: sepId, Value: "[SEP]"}

clsId, ok := tk.TokenToId("[CLS]")
if !ok {
	log.Fatalf("Cannot find ID for [CLS] token.\n")
}
cls := processor.PostToken{Id: clsId, Value: "[CLS]"}

postProcess := processor.NewBertProcessing(sep, cls)
tk.WithPostProcessor(postProcess)

return tk

}

func load_content(content string, tk *tokenizer.Tokenizer) []float32 { // Define input var input []tokenizer.EncodeInput input = append(input, tokenizer.NewSingleEncodeInput(tokenizer.NewInputSequence(content))) encodings, err := tk.EncodeBatch(input, true) if err != nil { log.Fatal(err) }

// Find max length of token Ids from slice of encodings
var maxLen int = 0
for _, en := range encodings {
	if len(en.Ids) > maxLen {
		maxLen = len(en.Ids)
	}
}

fmt.Printf("encodings: %v\n", encodings)
var tokInput []float32 = make([]float32, maxLen)
for _, en := range encodings {
	for i := 0; i < len(en.Ids); i++ {
		tokInput[i] = float32(en.Ids[i])
	}
}
return tokInput

}

func initLableMap() { // 打开 id2label.json 文件 file, err := os.Open(*lableMapFile) if err != nil { fmt.Println("Error opening file:", err) return } defer file.Close()

// 读取文件内容
content, err := ioutil.ReadAll(file)
if err != nil {
	fmt.Println("Error reading file:", err)
	return
}

// 创建 map[int]int
lableMap := make(map[int32]int32)

// 解码 JSON 数据
jsonData := make(map[string]string)
err = json.Unmarshal(content, &jsonData)
if err != nil {
	fmt.Println("Error decoding JSON:", err)
	return
}

// 将字符串键和值转换为 int 类型,并存入新的 map[int]int 中
for keyStr, valueStr := range jsonData {
	key, err := strconv.Atoi(keyStr)
	if err != nil {
		fmt.Println("Error converting key to int:", err)
		return
	}

	value, err := strconv.Atoi(valueStr)
	if err != nil {
		fmt.Println("Error converting value to int:", err)
		return
	}

	lableMap[int32(key)] = int32(value)
}

// 打印加载后的 map[int]int
for key, value := range lableMap {
	fmt.Println(key, ":", value)
}

}

var inTensor *pd.Tensor var outTensor *pd.Tensor var mainPredictor *pd.Predictor var wg sync.WaitGroup var tk *tokenizer.Tokenizer

func main() { flag.Parse() tk = getBert(*vocabFile) initLableMap() config := pd.NewConfig() config.SetModel(*modelName, *paramsName) if *useGpu { config.EnableUseGpu(100, int32(*gpuId)) // if *useTrt { // config.EnableTensorRtEngine(1<<30, 16, 3, pd.PrecisionFloat32, false, false) // if *useTrtDynamicShape { // minInputShape := make(map[string][]int32) // minInputShape["inputs"] = []int32{int32(*batchSize), 3, 100, 100} // maxInputShape := make(map[string][]int32) // maxInputShape["inputs"] = []int32{int32(*batchSize), 3, 608, 608} // optInputShape := make(map[string][]int32) // optInputShape["inputs"] = []int32{int32(*batchSize), 3, 224, 224} // config.SetTRTDynamicShapeInfo(minInputShape, maxInputShape, optInputShape, false) // } // } } else { config.SetCpuMathLibraryNumThreads(*cpuMath) } mainPredictor = pd.NewPredictor(config) inNames := mainPredictor.GetInputNames() outNames := mainPredictor.GetOutputNames()

// log.Println("inNames:", inNames)
// log.Println("outNames:", outNames)
println("input num: ", mainPredictor.GetInputNum())
println("input name: ", mainPredictor.GetInputNames()[0])
println("output num: ", mainPredictor.GetOutputNum())
println("output name: ", mainPredictor.GetOutputNames()[0])

var inHandles = make(map[string]*pd.Tensor)
var outHandles = make(map[string]*pd.Tensor)
for _, n := range inNames {
	inHandles[n] = mainPredictor.GetInputHandle(n)
}
for _, n := range outNames {
	outHandles[n] = mainPredictor.GetOutputHandle(n)
}

inTensor = inHandles[inNames[0]]
outTensor = outHandles[outNames[0]]

// test code
predict(testString)

startRpcServer()

log.Println("exit")
wg.Wait()

}

func predict(content string) int32 { wg.Add(1) defer wg.Done()

start := time.Now()

data := load_content(content, tk)
if data == nil {
	return -1
}

inTensor.Reshape([]int32{1, int32(*maxLen)})
inTensor.CopyFromCpu(data)

mainPredictor.Run()

outData := make([]float32, numElements(outTensor.Shape()))
outTensor.CopyToCpu(outData)
tim := time.Now().Sub(start)

result := maxValue(outData)
// result_str := ""
// if int(result) < len(labels) {
// 	result_str = labels[result]
// }

log.Printf("out max val: %d(%d) ,time: %v\n", result, lableMap[result], tim)
return result

}

func numElements(shape []int32) int32 { n := int32(1) for _, v := range shape { n *= v } return n }

func maxValue(vals []float32) (max_index int32) { var max float32 = 0 max_index = 0 for index, v := range vals { if v > max { max = v max_index = int32(index) } } return max_index }

疑问:就是go api中的reshape函数这里我填写的是[1,512],(报错应该跟这个没有关系), 因为我训练比较简单,就是通过输入content字符串截断512字节,label就是多个分类,比如1,2,3,4等。然后官方给的go推理demo是图像相关的,按照官方给的图像是没问题,paddleNlp的模型就有问题,nlp底层我感觉还是走的transfomer那一套。不知道大神是否用过paddlenlp的go推理。

看描述,训练出的模型应该没问题,麻烦你把go的推理代码整理一下新建个repo,并传一下你的参数文件,最好是我可以直接跑的单独项目。 我没跑过paddlenlp的go推理,但我得找到这个bug的具体的原因,才能找到相应的大佬去解决。

@hubimaso
Copy link
Author

hubimaso commented Nov 4, 2023

@6clc 链接:https://pan.baidu.com/s/1fGZiXEBW9lTesqGRGHynKg?pwd=i9tg
提取码:i9tg
提供了二进制程序和代码,在Centos7环境下运行,调试可以用dlv调试,模型和参数都在压缩包里面

@hubimaso
Copy link
Author

hubimaso commented Nov 7, 2023

感觉还是paddle对nlp模型(如bert chinese)的不支持,我go transfomer 参考这个大佬 https://github.com/sugarme/transformer
的也做了一个测试,他这个也不支持bert-base-chinese,但是支持bert-base-uncased,希望大佬们能提供下nlp相关go/c/++推理的demo

@MARD1NO
Copy link
Contributor

MARD1NO commented Nov 7, 2023

感觉还是paddle对nlp模型(如bert chinese)的不支持,我go transfomer 参考这个大佬 https://github.com/sugarme/transformer 的也做了一个测试,他这个也不支持bert-base-chinese,但是支持bert-base-uncased,希望大佬们能提供下nlp相关go/c/++推理的demo

这边可以先尝试用C++ API进行推理么

@hubimaso
Copy link
Author

hubimaso commented Nov 7, 2023

感觉还是paddle对nlp模型(如bert chinese)的不支持,我go transfomer 参考这个大佬 https://github.com/sugarme/transformer 的也做了一个测试,他这个也不支持bert-base-chinese,但是支持bert-base-uncased,希望大佬们能提供下nlp相关go/c/++推理的demo

这边可以先尝试用C++ API进行推理么

@hubimaso hubimaso closed this as completed Nov 7, 2023
@hubimaso
Copy link
Author

hubimaso commented Nov 7, 2023

感觉还是paddle对nlp模型(如bert chinese)的不支持,我go transfomer 参考这个大佬 https://github.com/sugarme/transformer 的也做了一个测试,他这个也不支持bert-base-chinese,但是支持bert-base-uncased,希望大佬们能提供下nlp相关go/c/++推理的demo

这边可以先尝试用C++ API进行推理么

主要是预处理这块熟不熟悉

@hubimaso hubimaso reopened this Nov 7, 2023
@paddle-bot paddle-bot bot added type/debug 帮用户debug status/following-up 跟进中 labels Nov 7, 2023
@paddle-bot paddle-bot bot added status/close 已关闭 status/reopen 重新打开 labels Nov 10, 2023
@paddle-bot paddle-bot bot closed this as completed Nov 10, 2023
@paddle-bot paddle-bot bot removed the status/following-up 跟进中 label Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/close 已关闭 status/reopen 重新打开 type/debug 帮用户debug
Projects
None yet
Development

No branches or pull requests

4 participants