Skip to content

[Bug] qwen3-max模型得不到评测结果,查看原因是数据传入为None #2359

@lqyisy100

Description

@lqyisy100

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': False,
'GCC': 'n/a',
'MMEngine': '0.10.7',
'MSVC': '用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.50.35720 版',
'MUSA available': False,
'OpenCV': '4.11.0',
'PyTorch': '2.9.1+cpu',
'PyTorch compiling details': 'PyTorch built with:\n'
' - C++ Version: 201703\n'
' - MSVC 194234444\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2025.3-Product Build 20251007 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.7.1 (Git Hash '
'8d263e693366ef8db40acc569cc7d8edf644556d)\n'
' - OpenMP 2019\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - CPU capability usage: AVX2\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, '
'COMMIT_SHA=5811a8d7da873dd699ff6687092c225caffcf1bb, '
'CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/pytorch/.ci/pytorch/windows/tmp_bin/sccache-cl.exe, '
'CXX_FLAGS=/DWIN32 /D_WINDOWS /EHsc '
'/Zc:__cplusplus /bigobj /FS /utf-8 '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER '
'-DLIBKINETO_NOXPUPTI=ON -DUSE_XNNPACK '
'-DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 '
'/wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 '
'/wd4804 /wd4273, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.9.1, USE_CUDA=0, USE_CUDNN=OFF, '
'USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, '
'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, '
'USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, '
'USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, '
'USE_XPU=OFF, \n',
'Python': '3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:05:25) [MSC v.1938 '
'64 bit (AMD64)]',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.5.1+unknown',
'sys.platform': 'win32',
'transformers': '4.57.3'}

重现问题 - 代码/配置示例

`from opencompass.models import Qwen

api_meta_template = dict(round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
], )

models = [
dict(
type=Qwen,
abbr="qwen3-max",
path='qwen3-max',
key=
'sk-xxxxxxxxxxxxxxx', # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well
meta_template=api_meta_template,
query_per_second=1,
max_out_len=1024,
max_seq_len=2048,
batch_size=8),
]`

‘from mmengine.config import read_base

with read_base():
from opencompass.configs.datasets.demo.demo_gsm8k_chat_gen import
gsm8k_datasets
from opencompass.configs.datasets.demo.demo_math_chat_gen import
math_datasets
from opencompass.configs.models.qwen2_5.qwen_test_opencompass_custom import models as qwen_model

datasets = gsm8k_datasets
models = qwen_model

重现问题 - 命令或脚本

python run.py examples/eval_api_demo.py -w outputs/qwen_max_test --debug

重现问题 - 错误信息

12/12 09:56:48 - OpenCompass - INFO - Task [qwen3-max/demo_gsm8k]
C:\Project\opencompass-plus-main.venv\Lib\site-packages\jieba_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
signal.SIGALRM is not available on this platform
signal.SIGALRM is not available on this platform
12/12 09:57:06 - OpenCompass - INFO - Try to load the data from C:\Users\maxweli.cache/opencompass/./data/gsm8k/

Map: 0%| | 0/7473 [00:00<?, ? examples/s]
Map: 33%|�������� | 2496/7473 [00:00<00:00, 24178.01 examples/s]
Map: 73%|���������������� | 5472/7473 [00:00<00:00, 25674.55 examples/s]
Map: 100%|��������������������| 7473/7473 [00:00<00:00, 25481.09 examples/s]

Map: 0%| | 0/1319 [00:00<?, ? examples/s]
Map: 100%|��������������������| 1319/1319 [00:00<00:00, 27596.31 examples/s]
12/12 09:57:06 - OpenCompass - INFO - Start inferencing [qwen3-max/demo_gsm8k]
[2025-12-12 09:57:06,588] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-12-12 09:57:06,588] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...

0%| | 0/1 [00:00<?, ?it/s]
100%|��������������������| 1/1 [00:19<00:00, 19.64s/it]
100%|��������������������| 1/1 [00:19<00:00, 19.64s/it]
12/12 09:57:26 - OpenCompass - INFO - time elapsed: 37.89s
C:\Project\opencompass-plus-main.venv\Lib\site-packages\jieba_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
signal.SIGALRM is not available on this platform
signal.SIGALRM is not available on this platform
12/12 09:57:52 - OpenCompass - INFO - Try to load the data from C:\Users\maxweli.cache/opencompass/./data/gsm8k/

Map: 0%| | 0/7473 [00:00<?, ? examples/s]
Map: 35%|�������� | 2652/7473 [00:00<00:00, 25592.68 examples/s]
Map: 72%|���������������� | 5360/7473 [00:00<00:00, 24882.88 examples/s]
Map: 100%|��������������������| 7473/7473 [00:00<00:00, 24828.75 examples/s]

Map: 0%| | 0/1319 [00:00<?, ? examples/s]
Map: 100%|��������������������| 1319/1319 [00:00<00:00, 23369.41 examples/s]
Parameter 'function'=<function OpenICLEvalTask._load_and_preprocess_test_data..postprocess at 0x000001B5179956C0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.

Map: 0%| | 0/8 [00:00<?, ? examples/s]
Map: 100%|��������������������| 8/8 [00:00<00:00, 511.68 examples/s]
text None
Traceback (most recent call last):
File "C:\Project\opencompass-plus-main\opencompass\tasks\openicl_eval.py", line 561, in
inferencer.run()
File "C:\Project\opencompass-plus-main\opencompass\tasks\openicl_eval.py", line 93, in run
self._score()
File "C:\Project\opencompass-plus-main\opencompass\tasks\openicl_eval.py", line 116, in _score
pred_strs = self._process_predictions(pred_strs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Project\opencompass-plus-main\opencompass\tasks\openicl_eval.py", line 244, in _process_predictions
pred_strs = [proc(s, **kwargs) for s in pred_strs]
^^^^^^^^^^^^^^^^^
File "C:\Project\opencompass-plus-main\opencompass\datasets\gsm8k.py", line 46, in gsm8k_postprocess
text = text.split('Question:')[0]
^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'

其他信息

由于报错是text是nonetype,因此日志中加了print(text),内容确实为None,但仅出现在Qwen3-max模型评测任务中,向其他如qwen2.5,qwen-max等模型,text都不是None。
另外还想问,看日志消息,似乎日志中做了两次评测,无论是哪一个模型似乎都是两次评测,这是正常的吗?只是Qwen3-max第一次评测没报错,第二次报错,其他模型两次评测都没报错

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions