Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCFFT] RocFFT fails tests when using ROCm 6.0 or later #559

Open
hjabird opened this issue Aug 19, 2024 · 4 comments
Open

[ROCFFT] RocFFT fails tests when using ROCm 6.0 or later #559

hjabird opened this issue Aug 19, 2024 · 4 comments
Labels
bug A request to fix an issue

Comments

@hjabird
Copy link
Contributor

hjabird commented Aug 19, 2024

Summary

The current tip of oneMKL interfaces fails unit tests when using ROCm 6.0 or later. Using ROCm 5.4.3 does not fail tests.

The failed tests are real-to-complex multi-dimensional tests.

Version

The issue was introduced with #528 and can be reproduced with oneMKL interfaces developr.

Environment

  • AMD MI210
  • Reproduceable RocFFT from ROCm 6.0 or 6.1
  • Ubuntu 22.04
  • ICPX 2024.2 with Codeplay's plugin for AMD (which supports ROCm 6.0.2 for MI210).

Steps to reproduce

Build with RocFFT enabled. Test with

./bin/test_main_dft_ct --gtest_filter=*REAL_SINGLE_in_place_USM*batches_1*

to observe failures.

Observed behavior

Tests give wrong results or memory faults.

Expected behavior

Tests should pass.

@Rbiessy Rbiessy added the bug A request to fix an issue label Aug 20, 2024
@hjabird
Copy link
Contributor Author

hjabird commented Aug 20, 2024

  • It seems to be the backward DFT that fails.
  • If the tests prior to regression-causing PR are used, the tests still pass. (ie. tests using the INPUT/OUTPUT API with recommit, instead of using FWD/BWD API).
  • This problem can also be reproduced on an AMD W6800.

@hjabird
Copy link
Contributor Author

hjabird commented Aug 21, 2024

This turned out to be a rocFFT bug. RocFFT issue ROCm/rocFFT#504

@hjabird
Copy link
Contributor Author

hjabird commented Aug 22, 2024

There is a work-around for the above at https://github.com/hjabird/oneMKL/tree/hjab/fix_rocfft6_issue

Unfortunately there is still are still failing tests - out-of-place complex 4x4x4_fwd_strides_2_4_1_16_bwd_strides_1_4_16_1_batches_2. These tests pass with ROCm 5.4.3, but fail with ROCm 5.7.1 and later.

@hjabird
Copy link
Contributor Author

hjabird commented Aug 22, 2024

This looks like a bug in rocFFT to me. I've described the issue at ROCm/rocFFT#507. I think we oneMKL Interfaces will have to throw unsupported on these tranposing DFTs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A request to fix an issue
Projects
None yet
Development

No branches or pull requests

2 participants