Don't run failed unit tests by default #5248

thk123 · 2020-02-27T11:00:23Z

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
[na] The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
[na] Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
[na] My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

This replaces #2256. By using the tag [.], the tests are not run by default, leading to no useless output when another unit test fails. By changing [!mayfail] to [!shouldfail] means that these tests fail if they pass. This stops tests inadvertently getting fixed without anyone noticing. I used @owen-mc-diffblue suggestion of xfail for an expected failure. Finally, I added a separate run of these failed tests to CI.

hannes-steffenhagen-diffblue

If you don’t want to run [!shouldfail] test you can just provide ~[!shouldfail] as a tag filter.

hannes-steffenhagen-diffblue · 2020-03-02T16:16:09Z

unit/testing-utils/use_catch.h

@@ -36,4 +36,7 @@ Author: Michael Tautschnig
 #include <util/pragma_pop.def>
 #endif

+/// Add to the end of test tags to mark a test that is expected to fail
+#define XFAIL "[.][shouldfail]"


From the commit message:

These tests won't be run by default (de-cluttering the output with expected failures). When run, they will fail if they pass

In previous PR Owen / Micahel complained that shouldfail was misleading. They shouldn't fail, in the perfect world, but they do due to known bug. We felt expected fail (or XFAIL as python calls them) was clearer. Can change to EXPECTED_FAIL if you'd prefer?

hannes-steffenhagen-diffblue · 2020-03-02T16:17:40Z

jbmc/unit/java_bytecode/ci_lazy_methods/lazy_load_lambdas.cpp

@@ -12,7 +12,7 @@ Author: Diffblue Limited.

 SCENARIO(
  "Lazy load lambda methods",
-  "[core][java_bytecode][ci_lazy_methods][lambdas][!mayfail]")


mayfail doesn’t stop a test from running, it just means test failure isn’t counted.

A test that can pass or fail is bad - since no body will observe if it changes state. I believe this has been erroneously marked as mayfail, since it passes.

hannes-steffenhagen-diffblue · 2020-03-02T16:18:36Z

jbmc/unit/java_bytecode/java_bytecode_convert_method/convert_invoke_dynamic.cpp

@@ -188,7 +188,7 @@ void validate_local_variable_lambda_assignment(
 SCENARIO(
  "Converting invokedynamic with a local lambda",
  "[core]"
-  "[lambdas][java_bytecode][java_bytecode_convert_method][!mayfail]")
+  "[lambdas][java_bytecode][java_bytecode_convert_method]" XFAIL)


this should be [!shouldfail] (it’s shouldfail in the macro), and I still dont think this should be a macro.

Fixed in later commit

hannes-steffenhagen-diffblue · 2020-03-02T16:19:59Z

.travis.yml

  - env UBSAN_OPTIONS=print_stacktrace=1 make -C jbmc/regression test-parallel "CXX=${COMPILER} ${EXTRA_CXXFLAGS}" -j2 JOBS=2
  - make -C jbmc/unit "CXX=${COMPILER} ${EXTRA_CXXFLAGS}" -j2
  - make -C jbmc/unit test
+  - echo "Running expected failure tests"
+  - make TAGS="[shouldfail]" -C jbmc/unit test xfail


??? Why not just use the builtin !shouldfail mechanism?

Not sure what you mean? The trailing xfail was a typo, but this is me passing the appropriate tags to the unit runner

This should be [!shouldfail], not [shouldfail]. I believe might be fixed by a later commit?

hannes-steffenhagen-diffblue · 2020-03-02T16:33:20Z

jbmc/unit/Makefile

-	if ! ./$(CATCH_TEST) -l | grep -q "^$(N_CATCH_TESTS) test cases" ; then \
-	  ./$(CATCH_TEST) -l ; fi
-	./$(CATCH_TEST)
+	if ! ./$(CATCH_TEST) *,[.] -l | grep -q "^$(N_CATCH_TESTS) matching test cases" ; then \


I’m not sure what this supposed to mean.

It is listing all tests, even hidden tests, to check the number of tests == the number of tests in source. (The check is to catch people forgetting to add their unit tests to Makefile).

thk123 · 2020-03-03T10:25:38Z

If you don’t want to run [!shouldfail] test you can just provide ~[!shouldfail] as a tag filter.

This is also an option.

I prefer my approach as it makes it easier to use for the general case. I.e. to run the unit tests, you just run the unit tests binary. You neither need to provide additional flags, nor do you get misleading output about failed tests.

Happy to change to that though, if that is preferred.

chrisr-diffblue · 2020-03-03T11:33:43Z

.travis.yml

@@ -355,9 +355,13 @@ script:
  - env PATH=$PATH:`pwd`/src/solvers UBSAN_OPTIONS=print_stacktrace=1 make -C regression/cbmc test-cprover-smt2
  - make -C unit "CXX=${COMPILER} ${EXTRA_CXXFLAGS}" -j2
  - make -C unit test
+  - echo "Running expected failure tests"
+  - make TAGS="[!shouldfail]" -C unit test


What about CMake configurations? Does anything need to be done for those?

Good spot, fixed

chrisr-diffblue

Looks good, thanks for checking/fixing the cmake stuff.

thk123 · 2020-03-03T13:50:32Z

jbmc/unit/Makefile

-	if ! ./$(CATCH_TEST) -l | grep -q "^$(N_CATCH_TESTS) test cases" ; then \
-	  ./$(CATCH_TEST) -l ; fi
-	./$(CATCH_TEST)
+	if ! ./$(CATCH_TEST) *,[.] -l | grep -q "^$(N_CATCH_TESTS) matching test cases" ; then \


@hannes-steffenhagen-diffblue how about?

Suggested change

if ! ./$(CATCH_TEST) *,[.] -l | grep -q "^$(N_CATCH_TESTS) matching test cases" ; then \

# Include hidden tests by specifying "*,[.]" for tests to count

if ! ./$(CATCH_TEST) *,[.] -l | grep -q "^$(N_CATCH_TESTS) matching test cases" ; then \

Oh. Ok, I thought this was using a glob pattern somehow. Would you mind putting quotes around it to clarify it’s supposed to be passed as an argument to catch as-is?

Done - will merge on green CI

hannes-steffenhagen-diffblue

I still don’t really like using a macro for this, but otherwise LGTM now.

These tests won't be run by default (de-cluttering the output with expected failures). When run, they will fail if they pass

These don't fail, so no reason not to have them running

This ensures that failed tests don't get inadvertently fixed, without cluttering the normal output

thk123 force-pushed the dont-run-failed-tests branch from 000a799 to a665490 Compare February 27, 2020 11:24

hannes-steffenhagen-diffblue reviewed Mar 2, 2020

View reviewed changes

thk123 marked this pull request as ready for review March 3, 2020 10:22

thk123 requested review from chrisr-diffblue, kroening, peterschrammel, smowton and tautschnig as code owners March 3, 2020 10:22

chrisr-diffblue reviewed Mar 3, 2020

View reviewed changes

thk123 force-pushed the dont-run-failed-tests branch from b1df747 to 5d92f1b Compare March 3, 2020 12:07

chrisr-diffblue approved these changes Mar 3, 2020

View reviewed changes

thk123 changed the title ~~Dont run failed tests~~ Don't run failed unit tests by default Mar 3, 2020

thk123 requested a review from hannes-steffenhagen-diffblue March 3, 2020 13:45

thk123 commented Mar 3, 2020

View reviewed changes

hannes-steffenhagen-diffblue approved these changes Mar 3, 2020

View reviewed changes

thk123 added 4 commits March 4, 2020 10:08

Add a define for XFAIL

da7a1fd

These tests won't be run by default (de-cluttering the output with expected failures). When run, they will fail if they pass

Enable two tests that pass

186855f

These don't fail, so no reason not to have them running

Replace mayfail with XFAIL

a2c45e8

Running expected failure tests after running regular unit tests

9fcd171

This ensures that failed tests don't get inadvertently fixed, without cluttering the normal output

thk123 force-pushed the dont-run-failed-tests branch from 5d92f1b to 92dac11 Compare March 4, 2020 10:10

Thomas Kiley and others added 3 commits March 4, 2020 10:12

Include hidden tests when counting tests

44d1ee4

Add running XFAIL tests to other jobs

18cca67

Add and invoke a ctest target for checking expected failure unit tests

6176871

thk123 force-pushed the dont-run-failed-tests branch from 92dac11 to 6176871 Compare March 4, 2020 10:12

thk123 merged commit fb07ce0 into diffblue:develop Mar 4, 2020

thk123 deleted the dont-run-failed-tests branch March 4, 2020 14:56

thk123 mentioned this pull request Mar 4, 2020

Use !shouldfail rather than !mayfail on unit tests #2256

Closed

	if ! ./$(CATCH_TEST) *,[.] -l \| grep -q "^$(N_CATCH_TESTS) matching test cases" ; then \
	# Include hidden tests by specifying "*,[.]" for tests to count
	if ! ./$(CATCH_TEST) *,[.] -l \| grep -q "^$(N_CATCH_TESTS) matching test cases" ; then \

Don't run failed unit tests by default #5248

Don't run failed unit tests by default #5248

Uh oh!

Conversation

thk123 commented Feb 27, 2020

Uh oh!

hannes-steffenhagen-diffblue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thk123 commented Mar 3, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chrisr-diffblue left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannes-steffenhagen-diffblue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!