Skip to content

[CI] Remind re-run when auto_parallel CI exit -6 #69212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 4, 2024

Conversation

waliwali777
Copy link
Contributor

@waliwali777 waliwali777 commented Nov 6, 2024

PR Category

Auto Parallel

PR Types

Others

Description

  1. 自动并行 CI(PR-CI-Auto-Parallel)会出现随机的异常退出 exit -6 问题,这是 CI 中的偶然问题,无法复现
    这会造成当前 case 后面的测试都不再执行,直接 case pass。这使得 CI 可能不能拦截到一些提交。
    解决:当检测到测试 exit -6 时,自动 re-run 一次,如果 exit -6 复现,跳过该测试,使用global_exit_250_arr记录该测试名称,在CI结束后通过 log 提示用户,并不会造成 CI 失败

  2. CI 统计 test 执行情况的track_case_status 函数存在运行失败 test 检测不到的 bug,有些 test 运行出错后并不会产生类似func_name_FAIL.log的日志文件,因此,在通过日志名字统计失败 test 的方法存在 bug。现在通过全局变量 在 execute_func_list 中记录每个 case 的执行结果,将每个失败测试名字通过数组记录,在 CI 结束的时候汇总输出

PCard-87521

Copy link

paddle-bot bot commented Nov 6, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@liym27 liym27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liym27 liym27 requested a review from tianshuo78520a December 4, 2024 02:54
Copy link
Collaborator

@tianshuo78520a tianshuo78520a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认下177、178 export变量是否有用

@liym27 liym27 merged commit 5313be9 into PaddlePaddle:develop Dec 4, 2024
28 checks passed
@waliwali777
Copy link
Contributor Author

确认下177、178 export变量是否有用

FLAGS_install_deps 默认是 0, 是要在 paddleNLP 测试运行前 install requirements.txt 的,之后设置为 1 是意味着之后运行不再重复 install requirements.txt
FLAGS_download_data 包含 llama 意味着llama测试数据已经下载过了,之后测试可以不用重复下载了
这将在之后的 PR 中添加注释

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants