Skip to content

Comments

[SPARSE] Add support for sparse TRSV with MKLCPU backend#407

Merged
Rbiessy merged 16 commits intouxlfoundation:developfrom
Rbiessy:romain/sparse_trsv
Nov 30, 2023
Merged

[SPARSE] Add support for sparse TRSV with MKLCPU backend#407
Rbiessy merged 16 commits intouxlfoundation:developfrom
Rbiessy:romain/sparse_trsv

Conversation

@Rbiessy
Copy link
Contributor

@Rbiessy Rbiessy commented Nov 10, 2023

Description

Add support for sparse TRSV using MKLCPU.
Relies on #403

All Submissions

  • Do all unit tests pass locally? Attach a log.
  • Have you formatted the code using clang-format?

New features

  • Have you provided motivation for adding a new feature?
  • Have you added relevant tests?

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 10, 2023

Tests using a recent MKL and DPCPP build: sparse_trsv_log.txt

@gajanan-choudhary gajanan-choudhary changed the title [SPARSE] Add support for sparse trsv with MKLCPU backend [SPARSE] Add support for sparse TRSV with MKLCPU backend Nov 13, 2023
Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRSV parts look fine to me. Depending on how the optimize_gemm discussions go in #374, this PR may or may not need further changes.

Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All files other than the unit tests are fine. I have a few suggestions for the tests.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 21, 2023

Note for the reviewers, I have made a change that affects the other sparse_blas tests to introduce EXPECT_TRUE_OR_FUTURE_SKIP. I was concerned about the number of tests that weren't properly run due to GTEST_SKIP that would interrupt the test on the first configuration that is not supported.
Log running all the tests on CPU using the 2024.0 package: log_sparse_blas-cpu.txt

@gajanan-choudhary
Copy link
Contributor

gajanan-choudhary commented Nov 22, 2023

Note for the reviewers, I have made a change that affects the other sparse_blas tests to introduce EXPECT_TRUE_OR_FUTURE_SKIP. I was concerned about the number of tests that weren't properly run due to GTEST_SKIP that would interrupt the test on the first configuration that is not supported. Log running all the tests on CPU using the 2024.0 package: log_sparse_blas-cpu.txt

@Rbiessy, it looks from your log_sparse_blas-cpu.txt file that the only tests that ran were GEMV. All GEMM and TRSV tests appear to have been skipped. Shouldn't all of them be running and passing?

Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the EXPECT_TRUE_OR_FUTURE_SKIP related changes which make the test logs misleading because:

    bool skip = test_helper_transpose<fpType>(GetParam());
    if (skip) {
        // Mark that some tests were skipped
        GTEST_SKIP();
    }

If there are 10 input combinations that are run and 9 pass, 1 is skipped, then the entire test is marked as skipped. There's a way to also pass a string to GTEST_SKIP(). We should use that to clearly indicate how many tests passed, how many tests failed, and how many were skipped.

@gajanan-choudhary
Copy link
Contributor

Note for the reviewers, I have made a change that affects the other sparse_blas tests to introduce EXPECT_TRUE_OR_FUTURE_SKIP. I was concerned about the number of tests that weren't properly run due to GTEST_SKIP that would interrupt the test on the first configuration that is not supported. Log running all the tests on CPU using the 2024.0 package: log_sparse_blas-cpu.txt

@Rbiessy, it looks from your log_sparse_blas-cpu.txt file that the only tests that ran were GEMV. All GEMM and TRSV tests appear to have been skipped. Shouldn't all of them be running and passing?

I see what happened there. Multiple tests are run, but some may throw some acceptable exception such as unimplemented and is therefore skipped, but we mark the entire set in the test as "Skipped" with a GTEST_SKIP() call in the end. This is misleading, case in point being my own comment above and confusion before reviewing the changes. Can we come up with a better long-term solution for this? Each gtest run could report how many subtests were run, how many passed, how many were skipped, and how many failed. Maybe 3 variables could be used to track that along to summarize in the end?

For that matter, I don't think a throwing an acceptable exception such as unimplemented should be considered a skipped test, we should mark it as passed. Any opinions here, @spencerpatty?

@gajanan-choudhary
Copy link
Contributor

For that matter, I don't think a throwing an acceptable exception such as unimplemented should be considered a skipped test, we should mark it as passed. Any opinions here, @spencerpatty?

On second thought, I take that back. I myself don't know what's the appropriate thing to do here. It is a skipped test, but if we mark that set as skipped, then it's misleading. If we mark it as passed, then it may appear to some as if there's no exception being thrown and that the input combination is supported. Maybe counting passed, skipped, failed tests is the way to go? Is there a better way?

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 22, 2023

@gajanan-choudhary regarding the issue that you mention, I agree that this is not ideal but this is the current way the tests are done for other domains as well.
The proper solution would be to split the tests so that each test runs only 1 configuration. I considered improving this but I didn't want to spend more time than needed.
I was able to confirm that none of the tests are skipped with a recent MKL build.

@gajanan-choudhary
Copy link
Contributor

I was able to confirm that none of the tests are skipped with a recent MKL build.

If none of the tests are skipped, then why do the log files such as log_sparse_blas-cpu.txt you posted earlier have "Skipped" written in them?

I understand that it is not that easy to have each test run just 1 configuration; that's fine, but at least can we have to logs print how many configurations were run, how many passed/failed/skipped? There's an option to pass a custom print to gtest_skip(), for instance. We could cascade the count of test statuses to the final return point.

@Rbiessy
Copy link
Contributor Author

Rbiessy commented Nov 28, 2023

@gajanan-choudhary, the log I attached was using oneapi 2024.0. This one has skipped configurations for gemm when transpose_B != nontrans and for trsv if the matrix is transposed because the backend is throwing unsupported exceptions. If I use a more recent nightly build of MKL none of the tests are skipped on CPU.
I've applied your suggestion but it's not working well with ctest. I don't know why the skipped messages are not printed even with the extra-verbose flag. The skipped messages are printed if we run a binary ./bin/test_main_sparse_blas_ct directly and use the --terse_output flag.
Here are the logs using ctest log_cpu_2024_0.txt and another one with the the terse_output log_cpu_2024_0_terse_output.txt.
I was not able to run this last change on a nightly build anymore due to permissions issue. I don't think this is blocking this PR.

Copy link
Contributor

@gajanan-choudhary gajanan-choudhary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I do have a final minor suggestion of adding a comment above the gtest_skip() call indicating that the printing may not work well with ctest, in which case running the binary with the --terse-output flag should print things. Just so that this info doesn't get lost in history if someone tries digging in!

Thanks for going above and beyond in seeing through my change requests on this PR. I really appreciate your effort in making the product better! Great work on the PR!

@Rbiessy Rbiessy merged commit d56a0f1 into uxlfoundation:develop Nov 30, 2023
@Rbiessy Rbiessy deleted the romain/sparse_trsv branch November 30, 2023 11:41
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
…on#407)

Co-authored-by: Gajanan Choudhary <gajanan@utexas.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants