Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench] Eager model failed to run and not reported as passed/failed #549

Open
pbchekin opened this issue Feb 22, 2024 · 1 comment
Open

Comments

@pbchekin
Copy link
Contributor

Some models in torchbench raise the following exception and are not reported in cvs report:

NotImplementedError: Eager model failed to run

Models:

  • basic_gnn_edgecnn

    RuntimeError: scatter_reduce_dpcpp does not have a deterministic implementation, but you set 
    'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' 
    option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us 
    prioritize adding deterministic support for this operation.
    
  • basic_gnn_gcn

    RuntimeError: scatter_add_dpcpp_kernel does not have a deterministic implementation, but you set 
    'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' 
    option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us 
    prioritize adding deterministic support for this operation.
    
  • basic_gnn_gin

  • basic_gnn_sage

  • hf_T5_base

    RuntimeError: Allocation is out of device memory on current platform.
    
  • hf_clip failed

    File "/opt/hostedtoolcache/Python/3.10.13/x64/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 180, in forward
     batch_size = pixel_values.shape[0]
    AttributeError: 'str' object has no attribute 'shape'
    
@vlad-penkin
Copy link
Contributor

June 5th update: All workloads, except for hf_clip, run without the RuntimeError: Eager run failed exception

Env:

  • pytorch is built from source, top of the main trunk, commit_id - 9a8ab778d34bd24c5caceb340837483decc4c311
  • triton xpu is built from source, top of the main trunk, commit_id - fe93a00ffe438e9ba8c8392c0b051b1662c810de
  • benchmark is built from source, top of the main trunk, commit_id - d54ca9f80ead108c8797441681e219becaf963d8
  • torchaudio is built from source, top of the main trunk, commit_id - 1980f8af5bcd0bb2ce51965cf79d8d4c25dad8a0
  • torchvision is built from source, top of the main trunk, commit_id - 10239873229e527f8b7e7b3340c40ee38bb1cfc4
  • PyTorch Dependency Bundle 0.5.0
  • Latest Rolling Driver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants