Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用trainer.train()的时候报错:KeyError: 'eval_loss' #1823

Open
chenxinxi opened this issue Nov 16, 2024 · 2 comments
Open

用trainer.train()的时候报错:KeyError: 'eval_loss' #1823

chenxinxi opened this issue Nov 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@chenxinxi
Copy link

Describe the bug/ 问题描述 (Mandatory / 必填)
用trainer.train()的时候报错:KeyError: 'eval_loss',但pytorch代码没有报错。

  • Hardware Environment(Ascend/GPU/CPU) / 硬件环境:
    Ascend

  • Software Environment / 软件环境 (Mandatory / 必填):
    -- MindSpore version (e.g., 1.7.0.Bxxx) :
    -- Python version (e.g., Python 3.7.5) :
    -- OS platform and distribution (e.g., Linux Ubuntu 16.04):
    -- GCC/Compiler version (if compiled from source):
    MindSpore:2.3.1
    mindnlp:0.4.1

  • Excute Mode / 执行模式 (Mandatory / 必填)(PyNative/Graph):
    graph

To Reproduce / 重现步骤 (Mandatory / 必填)

from mindnlp.engine import Trainer, TrainingArguments
training_args = TrainingArguments(
  output_dir="./vit-base-food101",
  per_device_train_batch_size=16,
  evaluation_strategy="steps",
  num_train_epochs=4,
  fp16=True,
  save_steps=100,
  eval_steps=100,
  logging_steps=10,
  learning_rate=2e-4,
  save_total_limit=2,
  remove_unused_columns=True,
  load_best_model_at_end=True,
)
import numpy as np
import evaluate
metric = evaluate.load("accuracy")
# the compute_metrics function takes a Named Tuple as input:
# predictions, which are the logits of the model as Numpy arrays,
# and label_ids, which are the ground-truth labels as Numpy arrays.
def compute_metrics(eval_pred):
    """Computes accuracy on a batch of predictions"""
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

trainer = Trainer(
    model=lora_model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
)

然后运行train_results = trainer.train()时报错。

Expected behavior / 预期结果 (Mandatory / 必填)
训练结束,但只训练到epoch0.36

Screenshots/ 日志 / 截图 (Mandatory / 必填)
image
image

Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.

@chenxinxi chenxinxi added the bug Something isn't working label Nov 16, 2024
@lvyufeng
Copy link
Collaborator

完整代码附件传一下

@chenxinxi
Copy link
Author

111601.zip老师我改了一下load_best_model_at_end=False就能跑通了,现在没有eval_accuracy,在想办法解决。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants