Skip to content

[Typing] 在 Paddle/test 中增加 Paddle/tools 的单测 #65905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 22, 2024

Conversation

megemini
Copy link
Contributor

PR Category

User Experience

PR Types

Improvements

Description

Paddle/test 中增加 Paddle/tools 的单测

@SigureMo

Copy link

paddle-bot bot commented Jul 10, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@megemini
Copy link
Contributor Author

Update 20240710

修改测试用例 test/tools/test_sampcd_processor.py

  • 针对 CI 环境,@unittest.skipIf 跳过 CPU 环境对于 cpu -> gpu 等测试用例
  • 去掉 doctester = Xdoctester(use_multiprocessing=False) 相关部分的测试

特此重点说明第二点,先看以下代码:

import queue
import threading
import functools

import paddle

from paddle.base.framework import global_var

print('-'*3, 'main thread in ...', global_var._dygraph_tracer_, global_var._functional_dygraph_context_manager)

paddle.disable_static()
# paddle.enable_static()

print('paddle.in_dynamic_mode', paddle.in_dynamic_mode())
print('global_var._dygraph_tracer_', global_var._dygraph_tracer_, global_var._functional_dygraph_context_manager)

print('-'*3, 'main thread out ...', global_var._dygraph_tracer_, global_var._functional_dygraph_context_manager)

def test():
    import paddle
    from paddle.base.framework import global_var

    print('-'*3, 'sub thread in ...', global_var._dygraph_tracer_, global_var._functional_dygraph_context_manager)

    paddle.enable_static()
    print('paddle.in_dynamic_mode', paddle.in_dynamic_mode()) # 此处应该为 `False` ,但实际为 `True`
    print('global_var._dygraph_tracer_', global_var._dygraph_tracer_, global_var._functional_dygraph_context_manager)

    print('-'*3, 'sub thread out ...', global_var._dygraph_tracer_, global_var._functional_dygraph_context_manager)

    data = paddle.static.data(name='X', shape=[None, 2, 28, 28], dtype='float32')

    return data


result_queue = queue.Queue()
exec_processer = functools.partial(threading.Thread, daemon=True)

def _execute_with_queue(queue):
    queue.put(test())


processer = exec_processer(
    target=_execute_with_queue,
    args=(result_queue,)
)

processer.start()
result = result_queue.get(timeout=100)
processer.join()

print(result)

以上代码模拟 doctester = Xdoctester(use_multiprocessing=False) ,即,当不使用 SOLO 指令(主进程单独运行),又不使用 multiprocessing 的时候(子进程运行),则会在 子线程 中执行检测。

代码运行出错:

--- main thread in ... <paddle.base.dygraph.tracer.Tracer object at 0x7f1351bb8040> <contextlib._GeneratorContextManager object at 0x7f13631091c0>
paddle.in_dynamic_mode True
global_var._dygraph_tracer_ <paddle.base.dygraph.tracer.Tracer object at 0x7f1351bb8040> <contextlib._GeneratorContextManager object at 0x7f13631091c0>
--- main thread out ... <paddle.base.dygraph.tracer.Tracer object at 0x7f1351bb8040> <contextlib._GeneratorContextManager object at 0x7f13631091c0>
--- sub thread in ... <paddle.base.dygraph.tracer.Tracer object at 0x7f1351bb8040> None
paddle.in_dynamic_mode True
global_var._dygraph_tracer_ <paddle.base.dygraph.tracer.Tracer object at 0x7f1351bb8040> None
--- sub thread out ... <paddle.base.dygraph.tracer.Tracer object at 0x7f1351bb8040> None
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "test_tmp.py", line 40, in _execute_with_queue
    queue.put(test())
  File "test_tmp.py", line 31, in test
    data = paddle.static.data(name='X', shape=[None, 2, 28, 28], dtype='float32')
  File "/home/shun/venv38dev/lib/python3.8/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/shun/venv38dev/lib/python3.8/site-packages/paddle/base/wrapped_decorator.py", line 40, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/home/shun/venv38dev/lib/python3.8/site-packages/paddle/base/framework.py", line 672, in __impl__
    assert (
AssertionError: In PaddlePaddle 2.x, we turn on dynamic graph mode by default, and 'data()' is only supported in static graph mode. So if you want to use this api, please call 'paddle.enable_static()' before this api to enter static graph mode.

主要问题是:子线程中无法设置主线程中的 global_var._functional_dygraph_context_manager 参数,导致状态错误。

paddle.base.framework.py

# use thread local to create thread save global variables.
class GlobalThreadLocal(threading.local):
    def __init__(self):
        """
        init the thread local data.
        TODO(xiongkun): how to access another thread local data ?
        """
        global _dygraph_tracer_
        self._in_to_static_mode_ = False
        self._functional_dygraph_context_manager = None
        self._dygraph_tracer_ = _dygraph_tracer_
        self._use_pir_api_ = get_flags("FLAGS_enable_pir_api")[
            "FLAGS_enable_pir_api"
        ]

    def __str__(self):
        strings = []
        strings.append("_in_to_static_mode_:" + str(self._in_to_static_mode_))
        strings.append(
            "_functional_dygraph_context_manager:"
            + str(self._functional_dygraph_context_manager)
        )
        strings.append("_dygraph_tracer_:" + str(self._dygraph_tracer_))
        return "\n".join(strings)

    def __setattr__(self, name, val):
        if name == "_dygraph_tracer_":
            global _dygraph_tracer_
            _dygraph_tracer_ = val
            core._switch_tracer(val)
        self.__dict__[name] = val


_dygraph_tracer_ = None
global_var = GlobalThreadLocal()

_dygraph_tracer_ 通过 Tracer 指针完成全局唯一,所以线程间传递没问题,但是 _functional_dygraph_context_manager 只在当前线程生效,也就出问题了 ~

提了个 PR #65930

也不知道设计的本意要不要在线程间共享数据,所以,这里先把这部分测试用例删掉吧 ~ 实际上也不影响目前 Xdoctest 的使用 ~

如果 #65930 接受的话,那么可以再把那部分用例加回来 ~ 本地测试,如果按照 #65930 修改,这部分测试可以通过 ~ 但不知道有无其他影响 ~

@megemini
Copy link
Contributor Author

Update 20240713

  • 只在 linux 上测试 multiprocessing 相关的用例

    win 的 pickle 跟 linux 不一样,因为示例代码检查只在 linux 环境中进行,所以这里就没单独为 win 补充测试用例 ~

  • 跳过 PR-CI-Coverage 的检查

    PR-CI-Coverage 实在太慢了,加之这次增加的测试用时也比较久,所以就跳过了 PR-CI-Coverage

  • test_sampcd_processor.py 中 timeout 相关测试的时间拉长

    这两天一直在调试 test_timeout fail 的问题,结论是:CI 服务器太差劲了,起一个进程要 1、2 秒甚至更长,因此只能拉长 timeout 时间 ~

目前 CI 基本都通过了,只是 win 相关的 CI 最近貌似一直不稳定,总出错,不知道是什么原因?

@SigureMo 请评审 ~

Comment on lines 13 to 14
set_tests_properties(test_sampcd_processor PROPERTIES TIMEOUT 1200)
set_tests_properties(test_type_checking PROPERTIES TIMEOUT 1200)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 分钟?两个加在一起 40 分钟?这时长不能接受了,是什么让这俩单测这么慢呢?

如果测试里用到的 paddle 的重计算 API,可以考虑不要用 paddle,而是换成标准库,与 paddle 本身解耦,我不认为这些单测有需要跑这么久的必要

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没那么久,我看 CI 里面 mac 最慢,test_sampcd_processor 大约 290 秒,test_type_checking 大约 170 秒 ~ 当时是因为 PR-CI-Coverage 慢所以改的 ~ 现在不用 PR-CI-Coverage 的话,我改小点就行 ~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mac 也不跑了吧

raise NotImplementedError


class Test_get_api_md5(TestSpecFile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class Test_get_api_md5(TestSpecFile):
class TestGetApiMd5(TestSpecFile):

class name 使用 CamelCase

Copy link

paddle-ci-bot bot commented Jul 21, 2024

Sorry to inform you that b80b81c's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for set_tests_properties(test_sampcd_processor PROPERTIES TIMEOUT 300) set_tests_properties(test_type_checking PROPERTIES TIMEOUT 200)

@luotao1 luotao1 merged commit d2d4e83 into PaddlePaddle:develop Jul 22, 2024
31 checks passed
lixcli pushed a commit to lixcli/Paddle that referenced this pull request Jul 22, 2024
* [Add] test for Paddle/tools

* [Fix] CMakeLists.txt ENVS

* [Change] remove test no multiprocessing

* [Change] skip gpu test if need

* [Fix] example on win

* [Fix] example on win/mac with __main__

* [Fix] skip multiprocessing checking on win/mac and opt timeout test time

* [Fix] use while instead of time.sleep

* [Fix] close queue

* [tmp] skip time_out

* [Change] timeout more

* [Change] cmake timeout more

* [Change] test without coverage

* [Change] test without coverage

* [Change] test timeout

* [Change] only on linux

---------

Co-authored-by: SigureMo <sigure.qaq@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants