Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future directions for the test API #204

Closed
Snaipe opened this issue May 3, 2017 · 29 comments
Closed

Future directions for the test API #204

Snaipe opened this issue May 3, 2017 · 29 comments

Comments

@Snaipe
Copy link
Owner

Snaipe commented May 3, 2017

I am a bit dissatisfied by the current state of the test API. When I started this project, I came from a Java/C++ background where we had annotation-driven (JUnit) & macro-driven (GTest) test frameworks, and was pretty frustrated by how painful to use C frameworks were. This was two and a half years ago, and since then I've seen the design mistakes that I made limit and impact the project in (mostly) negative ways.

I think it's time to address that. I've talked about it a bit in #168, but I think that we need to address a few problems with the current design of the test API

Consistency

We currently have three flavour of tests: regular, parameterized, and theories. It doesn't take much to see that all three are pretty much the same; the only thing that differs is the way parameters are generated. The Test, ParameterizedTest, and Theory macros pretty much roll their own semantics at the moment, and I really hate this.

Hierarchy

The current API defines a two-level hierarchy for tests: on top, test suites, and right below, tests. Criterion defines TestSuite as a mean to define test suites properties, although this can feel quite hack-ish. I always told myself that this hierarchy was good enough for anyone, but now, I think that this might be complexifying the API by always requiring a test suite. Thoughts? It might be better to optionally provide a parent in the test definition.

Compiler-dependency

This one is a bit minor compared to the consistency problem, but as I mentionned in #184, using sections to register tests makes the API compiler-dependent. While nowadays this is less of a problem with the market-share dominance of GCC, LLVM, and MSVC, this is something I'd like to push out of the API and into the implementation.

Proposals

Macro-based

This doesn't change much from the current API, but that doesn't mean we can't improve the semantics of the definition macros.

Turning parents into properties

We could turn the parent of a test into a property: instead of doing:

TestSuite(suite, properties...);
Test(suite, test) { ... }

We could have:

// no parent suite
Test(test) { ... }

// test is in a suite
TestSuite(suite, properties...);
Test(test, .parent = suite) { ... }

// test is nested in a suite, which is itself in a suite
TestSuite(root);
TestSuite(nested, .parent = root);
Test(test, .parent = nested) { ... }

However, this would make the usage of TestSuite to declare a suite mandatory, while its definition is currently inferred by the runner in 2.3.x.

Merging ParameterizedTest with Theory

The semantics of ParameterizedTest and Theory could be merged into one parameterized test macro. The usage of such a macro could be: ParameterizedTest(name, (args), properties...), e.g:

// parameterized test
ParameterizedTest(square, (int bar), .params.generator = my_generator) {
    cr_assert(eq(int, square(bar), bar * bar));
}

// theory
ParameterizedTest(division, (int bar, int baz), .params.matrix = my_theory_args) {
    cr_assume(ne(int, baz, 0));
    cr_assert(eq(int, divide(bar, baz), bar / baz));
}

Unify all the definition macros

We could additionaly go one step further and just make Test follow the above semantics. Regular tests would be declared with (void):

// regular test
Test(multiply, (void)) {
    cr_assert(eq(int, mult(1, 2), 2));
}

// parameterized test
cr_param_list generator_square(void) {
    static int i = 0;
    if (i > 2) {
        return NULL;
    }
    return cr_alloc_params((int, i++));
}

Test(square, (int bar), .params.generator = generator_square) {
    cr_assert(eq(int, square(bar), bar * bar));
}

// theory
cr_param_values division_matrix[] = {
    cr_values(int, INT_MIN, -2, -1, 0, 1, 2, INT_MAX),
    cr_values(int, INT_MIN, -2, -1, 0, 1, 2, INT_MAX),
};

Test(division, (int bar, int baz), .params.matrix = division_matrix) {
    cr_assume(ne(int, baz, 0));
    cr_assert(eq(int, divide(bar, baz), bar / baz));
}

Symbol-lookup

I believe this approach is already done by NovaProva. Basically, instead of relying on a macro to define our test as if it was something special, we just define a function with a specific name:

// regular test
void test_multiply(void) {
    cr_assert(eq(int, mult(1, 2), 2));
}

// using setup/teardown
void setup_multiply(void) {
}
void teardown_multiply(void) {
}

// parameterized w/ generator
cr_param_list generator_square(void) {
    static int i = 0;
    if (i > 2) {
        return NULL;
    }
    return cr_alloc_params((int, i++));
}

void test_square(int n) {
    cr_assert(eq(int, square(n), n * n));
}

// parameterized w/ static list
cr_param_list param_list_square_alt[] = {
    cr_define_params((int, 0)),
    cr_define_params((int, 1)),
    cr_define_params((int, 2)),
};

void test_square_alt(int n) {
    cr_assert(eq(int, square(n), n * n));
}

// theory
cr_param_values param_matrix_divide[] = {
    cr_values(int, INT_MIN, -2, -1, 0, 1, 2, INT_MAX),
    cr_values(int, INT_MIN, -2, -1, 0, 1, 2, INT_MAX),
};

void test_divide(int n, int d) {
    cr_assume(ne(int, d, 0));
    cr_assert(eq(int, divide(n, d), n / d));
}

This makes the API extremely simple; tests are just functions, and you can define other symbols to alter slightly the calling semantics of the test.

As for additional test properties, we have a few ways:

// simple but not convenient for C++
struct cr_test_properties properties_foo = {
    .timeout = 10,
}

// simple, consistent, but maybe more confusing 
double timeout_foo = 10;

// convenience macro
cr_properties(foo, .timeout = 10);

This approach, however, has a drawback: it makes it impossible to do some runtime initialization or exception handling for non-C languages (while declaring through a macro lets us do that since we can wrap the user function in a try/catch for instance); or rather than "impossible", it moves the responsibility back to the runner, so we have to compile some C++ on our side.

Another issue is that in practice, because we need to support C++ and Windows, we'd need to add a cr_register attribute to all of those symbols, which would expand to __declspec(dllexport) and/or extern "C" (the former is needed to actually find the symbol on windows, and the latter is needed to prevent mangling on the symbol).

Discussion

I'd really like some inputs on the matter. Preferably, the final choice will have to compromise between a few issues, but I'd really like to improve simplicity and consistency without shaving too much features off.

Finally, if anyone has other proposals, I'm all ears. There's no rush on the matter since we have some heavy refactor to do on the internals before changing the API, but we need to start thinking about it now.

@DoumanAsh
Copy link

Considering how difficult C can be, i think it would be difficult to get rid of macros.
While i find it nice to have just declare normal function, i think it maybe better to let Criterion to go with macros for test definitions.
It, after all, gives more control to Criterion and user do not need to worry about function signature and etc.

It would be though interesting to have possibility to define Suite as structure which could be used as context for running tests(similarly to have Google test uses its fixtures).

The semantics of ParameterizedTest and Theory could be merged into one parameterized test macro.

I'm not really sure it would easy to use as they serve quite different purposes.
It might get confusing.
Will user be able to have both semantics at once?

@Snaipe
Copy link
Owner Author

Snaipe commented May 4, 2017

Considering how difficult C can be, i think it would be difficult to get rid of macros.
While i find it nice to have just declare normal function, i think it maybe better to let Criterion to go with macros for test definitions.
It, after all, gives more control to Criterion and user do not need to worry about function signature and etc.

I think that's a fair point, but then I also find having the parameter list in the test macro (i.e. the (int foo, float bar) in Test(name, (int foo, float bar), ...) { ... }) weird on a language standpoint. Making it a simple function makes the API braindead simple, which is a really nice things to have when testing.

On the other hand, as you implied, this means that we lock ourselves in an ABI contract. This discussion will hopefully shine some light on whether this is a bad thing for us considering that I will feature-lock criterion 3.0.

One thing I dislike with the symbol proposal is that I feel that other symbols (setup_, teardown_, properties_, ...) are pretty big gotchas and might not be that convenient once you want to reuse some of the functions. Perhaps only test_* should count as a candidate for registration, and further properties should all go through cr_properties(name, .setup = setup_name, etc...)

I'm not really sure it would easy to use as they serve quite different purposes.
It might get confusing.
Will user be able to have both semantics at once?

That's not really a problem in my opinion once you realize that both are pretty much doing the same thing: calling a function with parameters.
The only thing that differs is how exactly parameters are generated. For the ParameterizedTest macro, it's a sequence of parameter lists that gets called sequentially (sum operation), and for the Theory macro, it's a matrix of parameter values that produces a parameter list for each permutation (product operation). And to be honest, cr_assume() is pretty much a if (!condition) cr_skip_test(), so there is no real reason why we're keeping both interfaces separate, except for the parameter generation as we described above.

@DoumanAsh
Copy link

DoumanAsh commented May 4, 2017

One thing I dislike with the symbol proposal is that I feel that other symbols (setup_, teardown_, properties_, ...) are pretty big gotchas and might not be that convenient once you want to reuse some of the functions. Perhaps only test_* should count as a candidate for registration, and further properties should all go through cr_properties(name, .setup = setup_name, etc...)

Something like setup/teardown would definitely better to go as property of Suite structure (if Criterion would go for explicit suite definition).

Btw how easily symbol look-up can be implemented?
Is there compiler-independent way?

That's not really a problem in my opinion once you realize that both are pretty much doing the same thing: calling a function with parameters.

Hm... well i suppose i may over-think it.
I suppose it would make sense when you can easily distinguish between type of parameterized test.

@Snaipe
Copy link
Owner Author

Snaipe commented May 4, 2017

Btw how easily symbol look-up can be implemented?
Is there compiler-independent way?

It's pretty easy. In fact, I've already been doing it for BoxFort and Mimick, so I can probably refactor that logic in a separate library and reuse it in the three projects.
This approach is, in fact, compiler independent, and is rather ABI (or more specifically executable-format) dependent. This isn't really a problem though, since criterion is already executable-format-dependent (because it uses BoxFort), and the market-share is overwhelmingly dominated by PE (Windows), ELF (Linux/BSD), and Mach-O (OS X/macOS/iOS)

@e74b254a966b1c
Copy link

e74b254a966b1c commented May 4, 2017

If you are going to merge Theories with Parametrised Tests there should still be a way (maybe a parameter) to generate all parameter combinations.
Also, I'd like to have a way to test against random input. I was thinking about requiring the user to provide some random stream generator.

struct stream_state {
    void *state; /* Steam state data. It will be up to the users to choose what they do with it. */
    void *head_value; /* The head value of the stream. */
};

/* A stream generator. This will be up to the user to implement.
 * You could use the current->state to make this function reentrant, or it could be ignored
 * (tt's up to the user how they want it to be)
 */
struct stream_state *next_state(struct stream_state current);

@Snaipe
Copy link
Owner Author

Snaipe commented May 4, 2017

If you are going to merge Theories with Parametrised Tests there should still be a way (maybe a parameter) to generate all parameter combinations.
Also, I'd like to have a way to test against random input. I was thinking about requiring the user to provide some random stream generator.

This can probably be done through the proposal to support generators. For instance, to generate the int parameters 0 through 2:

cr_param_list generator_square(void) {
    static int i = 0;
    if (i > 2) {
        return NULL;
    }
    return cr_alloc_params((int, i++));
}

(which pretty much is a more atomic version of the current ParameterizedTest interface)

And for random number generation (e.g., generating 10 random ints):

cr_param_list random_int_generator(void) {
    static int i = 0;
    if (i++ > 10) {
        return NULL;
    }
    return cr_alloc_params((int, random_int()));
}

However I have a few gripes with this. I don't like having to do memory allocation, type safety is missing, and the control flow of a regular function isn't really convenient for a generator.

struct stream_state {
   void *state; /* Steam state data. It will be up to the users to choose what they do with it. */
   void *head_value; /* The head value of the stream. */
};

I'd probably like the generation to be somewhat typesafe. The current Theory implementation claims UB when parameter types doesn't match the function prototype, without any compile-time warning. Plus, the way it infers which type is which when printing the parameters on a failure is broken.

I may have a proposal to address all these:

struct context {
    int32_t seed;
};

int32_t random_int32(int32_t *seed);

// declare a generator for the foo Test
Generator(foo, (i32 bar)) {
    cr_gen_enter(struct context, {
        .seed = ...
    });
    // generate 10 random numbers
    for (size_t i = 0; i < 10; ++i) {
        cr_gen_yield(random_int32(&seed));
    }
    cr_gen_leave();
}

Test(foo, (int32_t bar)) { ... }

Basically, Generator would declare a coroutine that gets called repeatedly until no more values are yielded. The i32 here would be a tag as described in the new assertion API, which translates to int32_t, so we can use this information to get some pseudo type introspection on the parameters, and statically assert at compile-time that both signatures match.

By the way, I think that criterion could also provide a simple implementation of a mersenne twister for convenience since rand() is garbage as a PRNG. Thoughts?

@e74b254a966b1c
Copy link

Starting from your proposal, I would like to change generators to make them able to create infinite streams.
So, instead of having a generator that gives us N values, we should have a generator that, when called, gives a single value and a new state.

/* Generator receives the types of the state and generated value.
 * It has 3 pre-defined variables:
 *     state      - const pointer to the current state
 *     next_state - location to store next state (generator should fill this)
 *     value      - location to store generated value (generator should fill this)
 * In the case below, state type = int32_t, value type = char.
 */
Generator((int32_t, char), char_generator) {
    int32_t number = random_int32(state);
    *next_state = number;
    *value = number % 0xFF;
}

int32_t seed1 = 42, seed2 = 7, seed3 = 13;

/* Similar to a theory, but instead of providing theory data points for c1, c2, c3,
 * you give them generators + seeds.
 * The last parameter is the number of iterations.
 */
StreamTheory((char_generator &seed1 c1, char_generator &seed2 c2, char_generator &seed3 c3),
             suite_name, test_name, 1232445) {
    test_something_with(c1, c2, c3);
    /* Please note that nowhere, do we need to explicitly call char_generator.
     * It will be up to Criterion to call char_generator for each iteration.
     * In the first iteration, the generators will receive seed1, seed2 & seed3, but
     * in the next iterations they will receive the states generated at the previous step.
     * Passing the state as function parameters ensures that generating c1 will not affect
     * generating c2 and c3.
     */
}

By the way, I think that criterion could also provide a simple implementation of a mersenne twister for convenience since rand() is garbage as a PRNG. Thoughts?

Don't know about rand()'s RNG qualities, but if it's bad, sure! Having this + stream generators will make the tool great for testing code against bad formatted input..
Also, would love some parallelized theories :)

@Snaipe
Copy link
Owner Author

Snaipe commented May 6, 2017

Starting from your proposal, I would like to change generators to make them able to create infinite streams.

Note that the proposal I made defined a generator to be a coroutine, so an infinite stream can simply be represented as follows:

Generator(foo, (i32 bar)) {
    cr_gen_enter(void);
    // infinite generator of 1
    while (true) {
        cr_gen_yield(1);
    }
    cr_gen_leave();
}

The idea of making generators reusable is interesting. If that's the case, maybe the best course of action would be to embrace the fact that it's a function and get rid of the Generator macro?

out_type foo_generator(cr_gen_ctx ctx, state_type *state) {
    cr_gen_enter(ctx);
    // do something with the state, yield some values, ...
    cr_gen_leave(ctx);
}
StreamTheory((char_generator &seed1 c1, char_generator &seed2 c2, char_generator &seed3 c3),
             suite_name, test_name, 1232445) {

If the generators can be specified orthogonally from each others, this raises a few questions that are going to be hard to solve. For instance, what happens if you pass an infinite generator for c1, but a finite generator for c2? what happens if the iteration number is greater than the number of times the finite generator can be called?

Also the fact that you base your example as an extension of the Theory semantics makes the orthogonality of generators difficult. I don't even think we can have generators at all for Theories, because by nature Theories are called with the n-dimensional matrix products of all of its possible parameters. For instance:

If the possibles values of c1 are (1, 2); c2 (3, 4); and c3 (5, 6), then the theory will be called with the following parameters: (1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6). More generally, for N parameters with M values each, you have M^N possible combinations.

You see what kind of problem we're going to have for generators: for each new value produced by a generator, we'll have to test it against every previous values ever produced, which becomes impractical for very large values as we have to store things as we go and we can only guess the upper bound; and while you may be able to be warned for ressource exhaustion at compile time with a static array, this is not possible with a generator.

I realize that this can be solved by doing some careful dynamic allocation, but it's hard to make this fast, and it might blow up in the developer's face when not used with care. Plus, Theories really aren't made for random testing, but careful testing with hand-picked "interesting" values (or ranges of values, which should definitely be something to support in the future). Testing against random values isn't theory testing, but fuzzy testing, and should be used in parallel to theories.

On another note, maybe we're taking an approach to the parameterization problem that is too complicated. How about not parameterizing tests, but introducing subtests:

Test(foo) {
    int32_t state = 42;
    Subtest("testing against 10 random values", .repeat = 10) {
        int32_t val = random_int32(&state);
        cr_assert(eq(do_thing(val), expected_val)); // failing the assert fails the subtest, but the test continues
    }
}

However, I have two issues with this:

The first issue I see there is that this syntax doesn't let us define nested functions (and while it might be accepted as a GNU extension, MSVC doesn't allow it) so we need to put this in a for loop. This means that we can't really spawn threads ourselves to parallelize this, but this can be worked around by the usage of OpenMP pragmas, which all of the compilers we target support.

The second issue is that if we parallelize this, then we'll be throwing multithreading into subtests, which might confuse developers (since the function they call might be racy) unless they explicitely want it. We could add a .parallel = true option for Subtest, though. A big plus of this approach however is that threads are significantly cheaper than spawning a new process like we do for parameterized tests, so this is going to be very fast.

@a1lu
Copy link
Contributor

a1lu commented May 7, 2017

Turning parents into properties

I like this approach. It would be nice to be able do define the Test - TestSuite relation like this:

Test(foo) {...}

TestSuite(bar, ...)
{ // or another separation
    Test(foo2){...}
}

I'm afraid that's not possible in C though.

Unify all the definition macros

We should do this do simplify the API. But I don't like the signature ( Test(multiply, (void)) ) for regular tests.

@e74b254a966b1c
Copy link

If the generators can be specified orthogonally from each others, this raises a few questions that are going to be hard to solve. For instance, what happens if you pass an infinite generator for c1, but a finite generator for c2? what happens if the iteration number is greater than the number of times the finite generator can be called?

I was thinking about cycling through that finite amount of data points so that c2 acts as an infinite generator. On an abstract level, all that these infinite generators need to do is take a state and return a pair of state and value (State -> (State, Value)). This condition alone is enough for them to act like infinite streams.

Also the fact that you base your example as an extension of the Theory semantics makes the orthogonality of generators difficult. I don't even think we can have generators at all for Theories, because by nature Theories are called with the n-dimensional matrix products of all of its possible parameters.

You are right, it was a mistake to associate them with theories.

Testing against random values isn't theory testing, but fuzzy testing, and should be used in parallel to theories.

Yes, fuzzing is what I really want to do with generators. Also the fact that generators are independent and they pass the states as arguments makes them perfect to track the cause of errors and also make it possible to resume from a known state, after fixing a bug (this is mostly useful when the generators aren't random, but instead they simulate some kind of communication protocol that has states).

Maybe a good approach is to make a Fuzzing function, different from the Theory function.

@Snaipe
Copy link
Owner Author

Snaipe commented May 9, 2017

@a1lu

I'm afraid that's not possible in C though.

Right, it's not. However, we can have subtests that follow the same hierarchy (as mentionned a few posts up).

We should do this do simplify the API. But I don't like the signature ( Test(multiply, (void)) ) for regular tests.

I was about to propose something like this:

Test(foo) {
}

Test(bar, params(i32 i, u64 u)) {
}

But I realize that this is also impossible to do as the macro concatenation of some identifier with . (as in .foo = bar) is not valid. Do you perhaps have something in mind?


@mirror3000

I was thinking about cycling through that finite amount of data points so that c2 acts as an infinite generator.

Interesting, I was thinking of stopping altogether with a finite generator finished, but this is a sensible option too. This also means that we effectively have to store all of the previous values, which becomes a problem if the generated set is large (and it usually is considering generators are precisely used when the context may become too large).
And we can't really not store previous values either since some generators may include one-time context and may not be callable more than once. In fact we can't assume anything from the contents of generator functions since they are technically user code.

I'm still not sure whether having separate generators is simpler to implement and use though. One could still use one generator for a particular test, that calls itself sub-generators (like a PRNG) in the manner they want. This might let users have more control over what values go through the generator, as some of them will definitely use them for values that aren't random.

make it possible to resume from a known state

That's an interesting idea. I think this could be nice to have especially with the feature request described in #202.

Maybe a good approach is to make a Fuzzing function, different from the Theory function.

I wonder if introducing yet another macro is a good idea on the long term.

I've been thinking a bit about how theories & parameterized tests are being implemented, and I definitely think that we could leverage some of the techniques I used in Mimick to run special tests like theories (or even fuzzing if we do implement this). Maybe something like the following:

int div_ints(int a, int b) {
    return a / b;
}

CR_CONTRACT_PREPARE(div_ints);

bool nonzero(int val) {
    return !!val;
}

Test(theory) {
    // create a contract over div_ints, called with any int as a first parameter, and any nonzero int as a second parameter
    cr_contract c = cr_contract_from(div_ints(cr_any(int), cr_that(int, nonzero)));

    // check for the theory using the contract with the specified datapoints
    int int_dp[] = {INT_MIN, -1, 0, 1, INT_MAX};
    bool ok = cr_theory_check(c, .datapoints = {int_dp, int_dp});

    // assert that the check passed
    cr_assert(ok, "checking theory for div_ints");
}

Test(fuzz) {
    // create a contract over div_ints, called with any int as a first parameter, and any nonzero int as a second parameter
    cr_contract c = cr_contract_from(div_ints(cr_any(int), cr_that(int, nonzero)));

    // fuzz the contract over 100 iterations
    bool ok = cr_fuzz(c, 100);

    // assert that the check passed
    cr_assert(ok, "checking fuzzing for div_ints");
}

@am11
Copy link
Contributor

am11 commented Jun 25, 2017

An extra +1 if Theory and Fact terminologies are preferred over ParameterizedTest and [Regular]Test :)

@Snaipe
Copy link
Owner Author

Snaipe commented Jul 9, 2017

Another point that was brought up today on IRC: setup & teardown fixtures could take the same parameters as the test itself, so you could have reusable fixtures:

static int file;

void openfile(const char *path)
{
    file = open(path, O_RDONLY, 0);
}

void closefile(const char *path)
{
    (void) path; // unused
    close(fd);
}

Test(foo, (str path), .init = openfile, .fini = closefile)
{
    // test file
}

Though I'm not sure how that would work for suite-specific fixtures. I'll have to think more about this.

@Snaipe
Copy link
Owner Author

Snaipe commented Jul 21, 2017

Another possible approach for test suite structuring:

Instead of defining the hierarchy through macros/properties, simply use the filesystem.

That is, for the following tree:

tests
 |-- foo
 |    |-- foo.c
 |    `-- test.c
 `-- bar.c

We would have two root test suites, foo and bar, and a test test suite as a child of foo. foo.c would be considered to be in the foo suite, to allow the definition of test suite properties for instance.

Now, this would obviously force users to split their tests in multiple files if they initially had all their suites in one source file, but it has a few advantages:

  1. It's dead simple. You don't need to worry too much about properties, and it naturally comes with organizing the source tree.
  2. Inheritance of test properties is a breeze. You can just have a TestSuite defined somewhere in a <testsuite>.c file, and all of its children will apply those properties.
  3. It favors code reuse, composability, and conditional compilation. You can pretty much have the same test file symlinked at few positions, but with different test properties coming from different suites (would address RFE: add capability to add tests to multiple suites #210 perfectly); or have an entire test suite disabled due to requirements not being met on a specific platform.

Thoughts on that?

@DoumanAsh
Copy link

While I do find it a pretty simple, I don't really like how you'd force user's hand here.
And on other hand aside from API you'd also need to remember how to organize your tests as file structure becomes a part of API almost.

I'm not strongly opposed to it though, just my thoughts on forcing user's hand.

@cbatten
Copy link

cbatten commented Oct 2, 2017

We are using Criterion in a sophomore-level course on computer systems programming using C/C++ at Cornell University and we really like it. More info here: https://web.csl.cornell.edu/courses/ece2400 Also glad to see that the framework is continuing to evolve and improve. Personally, I would love if Criterion moved to the "symbol-lookup" proposal mentioned above which uses standard functions to define tests. The Test macros are pretty confusing to new users. I also really like the parameterized with static list example in the "symbol-lookup" proposal. We use py.test for all of our Python unit testing, and the "symbol-lookup" idea is very much in that direction.

@Snaipe
Copy link
Owner Author

Snaipe commented Oct 5, 2017

Great to hear your inputs on the matter.

I am a bit puzzled by a few consequences of the symbol lookup proposal, and I feel we're trading off safety for simplicity, especially with parameterized tests.

For instance, consider the following setup:

// parameterized w/ static list
cr_param_list param_list_square_alt[] = {
    cr_define_params((double, 3.14)),
};

void test_square_alt(int n) {
    // undefined behaviour
}

This, while a horrible idea, would compile without warning or errors because we have no way to enforce type safety on two seemingly-unrelated symbols (the parameter list, and the function).

Now, we could still somewhat achieve that by inspecting debugging information and panic when the types mismatch, but this would break the expectation that Criterion would compile without debugging symbols.
That wouldn't technically be a problem since we're in 3.0 territory and breakage is fair game, but I feel like Criterion not working simply because you forgot to add -g is a huge gotcha that is bound to bite us back.

Plus, as I mentionned earlier, we'd have to stick a cr_test annotation to the test function to stick a __declspec(dllexport) when needed otherwise that wouldn't work either.

What we could probably do in this case is go with something similar to novaprova:

cr_param(n, int, 0, 1, 2);

cr_test void test_square(void) {
    cr_assert(eq(int, square(n), n * n));
}

Though, I'm not sure how we would go about detecting which function a parameter is for. Maybe use the line number through __LINE__?

@cbatten
Copy link

cbatten commented Oct 6, 2017

Hmmm ... I guess I will have to give it some more thought. Based on your post I took a closer look at novaprova. It seems like both Criterion and novaprova are modern frameworks that enable best practices for test-driven design in C and C++ ... I would be interested to know your thoughts on comparing the two frameworks? Criterion is more portable (novaprova only works on Linux) and seems to be more actively developed ... is there any other important differences between the two frameworks?

@Snaipe
Copy link
Owner Author

Snaipe commented Oct 6, 2017

I can't speak for its author, but I think I can summarize some of the differences pretty well:

Novaprova takes the stance that to achieve perfect isolation and allow for in-depth instrumentation of error conditions on tests, the program has to run under valgrind. It does this by re-exec'ing itself with memcheck, and using various hooks to catch and report various error conditions, including memory leaks, addressing errors, and other crashes. An added bonus, which is huge and was something we didn't even support until new/assert.h was introduced, is that novaprova leverages debugging information, added to the instrumentation capabilities of valgrind, to pretty-print assertions operands, so that users get the most out of assertion failures.

While I find that method brilliant in itself, I do not agree with it. Because it runs under Valgrind/memcheck, novaprova can only work on platforms that run them, and worse, the test code is never going to run on the hardware, as valgrind is essentially a virtual machine, which means that you're going to have behavioural differences on some edge cases. Finally, code running under valgrind is orders of magnitude slower than it normally is.

With Criterion, I took the stance of sandboxing the test code into a separate process to monitor it from another process. In this case, the code can run wherever the user wants -- be it on the host processor, or a qemu process. I also didn't want to include any fluff that would lock users into using a specific technology -- which is why there is no memory leak check facilities, coverage tools, or other instrumentation integrated by default with criterion. Instead, users can choose whatever they want, and run it however they want. I think this was a good decision in hindsight, given the rising popularity of address-sanitizer over memcheck.

Other (though older) frameworks also offered tests to run in a sandboxed process (though usually only on systems supporting fork()), but made it optional, which is I think a terrible decision and why I've been vehemently trying to prevent the addition of a --no-fork switch. Things just can't work for specific kinds of tests when you make address-space isolation optional, and worse that that, this breaks behavioural consistency and makes parallel testing impossible as suddenly single-threaded code modifying global state may start conflicting with itself. Now I could blame that on horrible user code, and that testable code should not rely on hidden global state, but that's just a poor excuse for a terrible design decision -- which is why I tried my hardest to make Criterion indulge the worst kind of user-code by minimizing interaction with it. What good would a testing framework be if it required perfect code, after all?

Aside from that, Criterion currently follows a rigid test structure, while novaprova is a bit more modular. This is kind of funny now, since the criterion runner does support custom test trees, but the C API doesn't let us do that -- which is precisely the reason why I opened this issue with a bunch of proposals to address this.

I'm sure there are other differences, but I haven't dug enough in novaprova to be able to accurately compare them.

@cbatten
Copy link

cbatten commented Oct 6, 2017

This is a really great comparison! It might be interesting to see if it is possible to bring some of the nicer features from Novaprova into Criterion. We are going to play around with both in my course and see what we can learn.

From just a very cursory look at Novaprova some things that seem nice to me are: (1) tests are just functions with automatic test discovery (the common case of non-parameterized tests is very clean with absolutely no boiler plate just like in py.test, not even a cr_test in front), (2) coding convention of using all caps for macros (e.g., NP_ASSERT_EQUAL) -- seems minor but it helps new users understand what is a macro and what is a function, (3) pretty printing values that are used in an assertion (super useful), (4) getting a stack trace on an assertion error (super, super useful!), (4) the filesystem/tree-based test organization (super similar to py.test, really nice), and (5) the ability to use a single testrunner for all tests (maybe Criterion can do this too? we will see how that works in practice).

The new Criterion assertions will pretty print the values in an assertion, but even so Criterion might want to keep the cr_criterion_eq( a, b ); style macros as opposed to forcing users to use cr_criterion( eq( a, b ) ); I imagine the first can just use the latter so you would still get the benefit of printing the values of a and b? To a new user cr_criterion_eq( a, b ) is more intuitive than cr_criterion( eq( a, b ) ).

The integration with valgrind seems super useful -- in my course students are constantly writing buggy code with segfaults, memory leaks, etc. Having an automated way where common valgrind errors just turn into normal test errors is pretty slick. I wonder for Criterion if there might be a way to better integrate/document using Criterion with address sanitizer? I don't know enough about address sanitizer but it is on my list to experiment with it ... having a very clean way to catch these bugs as part of the standard testing infrastructure without having to have another tool (and another make target, etc) is pretty nice in my opinion.

Using GDB in Criterion is a bit of a struggle, but I definitely understand your motivation for not having a --no-fork. However, students have to learn about inferiors, we have been having trouble setting a breakpoint once we are at a breakpoint in the test, and we are having trouble getting reverse debugging working ... but this is probably just because we are still figuring out best practices for using GDB with Criterion. Novaprova might have the same kind of issues.

The portability issue of novaprova might be a show stopper since at least myself and many of my TAs use Mac OS X for development ...

Regardless, it is really exciting to see such modern and sophisticated testing frameworks for C. The last time I looked around for C unit testing frameworks was 15 years ago and the options were pretty dismal. When I started preparing for teaching this new course, I was just really pleased to find a framework like Criterion that we could use -- so thank you for your hard work on this great project!

@Snaipe
Copy link
Owner Author

Snaipe commented Oct 7, 2017

This is a really great comparison! It might be interesting to see if it is possible to bring some of the nicer features from Novaprova into Criterion. We are going to play around with both in my course and see what we can learn.

That's great! While I don't agree with the vision of Novaprova, I still think it's a step in the right direction compared to Check/CUnit, and I definitely won't claim that my own design is better, or more "right" -- so please use what you think is best for your students.

tests are just functions with automatic test discovery (the common case of non-parameterized tests is very clean with absolutely no boiler plate just like in py.test, not even a cr_test in front)

Note that in this case cr_test would "technically" be optional on platforms other that Windows. The reason we need to stuff in a dllexport is that MSVC doesn't ever export any symbol (it doesn't even populate the COFF symbol table), so we have to rely on exported DLL symbols.

coding convention of using all caps for macros (e.g., NP_ASSERT_EQUAL) -- seems minor but it helps new users understand what is a macro and what is a function

Ah, yes, I can see why that would be an issue. The original reason why cr_assert is lowercased is because Criterion used to re-define assert(), which turned out to be one of the worst ideas I've ever had, and when I renamed the macro, the matching convention sticked. I think that it being somewhat similar to assert in looks and usage isn't too bad, but I can also understand why uppercasing it has its own merits.

Criterion also doesn't have the best of consistencies, as this was started as a spare-time school project, and my own conventions shifted around a year into development. This is mostly why I've been refactoring a lot of the internals and tried correcting API design mistakes in the past few months.

(4) getting a stack trace on an assertion error (super, super useful!),

This is something I've been meaning to explore. I started an experiment a year ago with Tizzy to see how I could add backtraces to Criterion, but so far the challenges are that it's pretty hard to from the runner process (I could do it from the worker process, but that would involve either doing that for assertions only, or install a signal handler for sigsegv, which unfortunately can be replaced by user code). Getting it right is hard, but I'm not giving up yet.

(5) the ability to use a single testrunner for all tests (maybe Criterion can do this too? we will see how that works in practice).

Not sure what you mean by that. Unless you explicitely compile different test source files into their own executable, then all the tests are going to be run by one test runner.

The new Criterion assertions will pretty print the values in an assertion, but even so Criterion might want to keep the cr_criterion_eq( a, b ); style macros as opposed to forcing users to use cr_criterion( eq( a, b ) ); I imagine the first can just use the latter so you would still get the benefit of printing the values of a and b? To a new user cr_criterion_eq( a, b ) is more intuitive than cr_criterion( eq( a, b ) ).

I'll have to think about it. You do have a point that it's a bit less confusing, but on the other hand it seems a bit redundant to have two ways to express the same thing. Plus, I think that confusion needs to be prevented (or at least aleviated) by having outstanding documentation -- and the docs are in need of improvement.

The integration with valgrind seems super useful -- in my course students are constantly writing buggy code with segfaults, memory leaks, etc. Having an automated way where common valgrind errors just turn into normal test errors is pretty slick. I wonder for Criterion if there might be a way to better integrate/document using Criterion with address sanitizer? I don't know enough about address sanitizer but it is on my list to experiment with it ... having a very clean way to catch these bugs as part of the standard testing infrastructure without having to have another tool (and another make target, etc) is pretty nice in my opinion.

I don't think that forcing memcheck to run has too many advantages, as you can still benefit from its diagnostics by running the test executable itself with valgrind --leak-check=full --trace-children=yes --error-exitcode=1, and if you force memcheck, you can't actually run other tools like callgrind. Plus, criterion still detects when it's run under valgrind and makes itself "friendlier" for diagnostics.

As for address sanitizer, I'd say the support is even better. You just need to compile your test executable with -fsanitize=address, and you're done! Address sanitizer should make the test exit with a nonzero exit status on any kind of addressing error, which should fail the test. Bonus points, you can also benefit from -fsanitize=undefined and -fsanitize=thread to catch undefined behaviours and data races, and it's still natively supported by the runner since the test workers are aborted abnormally.

Using GDB in Criterion is a bit of a struggle, but I definitely understand your motivation for not having a --no-fork. However, students have to learn about inferiors, we have been having trouble setting a breakpoint once we are at a breakpoint in the test, and we are having trouble getting reverse debugging working ... but this is probably just because we are still figuring out best practices for using GDB with Criterion. Novaprova might have the same kind of issues.

I'm guessing you've seen the criterion.gdb script? It should help a little bit with multi-inferior debugging, but you still need to know what an inferior is and having a little bit of baggage with gdb helps, so I get that it might not be ideal. Plus, I don't think it worked on OS X, because getting gdb to work was a bit special.

I'd recommend remote debugging, or if you can't get that working, just pass --debug=idle and then attach to the PID. This should be fairly working on pretty much all platforms.

@cbatten
Copy link

cbatten commented Oct 8, 2017

tests are just functions with automatic test discovery (the common case of non-parameterized tests is very clean with absolutely no boiler plate just like in py.test, not even a cr_test in front)

Note that in this case cr_test would "technically" be optional on platforms other that Windows. The reason we need to stuff in a dllexport is that MSVC doesn't ever export any symbol (it doesn't even populate the COFF symbol table), so we have to rely on exported DLL symbols.

Ah, interesting. That means since we only use Linux and Mac OS X we could probably get away without the cr_test, but it might be good practice to just always include it anyways.

coding convention of using all caps for macros (e.g., NP_ASSERT_EQUAL) -- seems minor but it helps new users understand what is a macro and what is a function

Criterion also doesn't have the best of consistencies, as this was started as a spare-time school project, and my own conventions shifted around a year into development. This is mostly why I've been refactoring a lot of the internals and tried correcting API design mistakes in the past few months.

Maybe using all caps for macros might be something to consider for the new API then? Especially with the new composable assertions, I think using EQ, LT, etc would be important to (1) avoid namespace collisions with small helper functions, and (2) to emphasize that these are macros. Just a thought ... it might also help improve consistency and better match many other coding conventions to use all caps for macros and lowercase with underscores for functions.

(4) getting a stack trace on an assertion error (super, super useful!),

This is something I've been meaning to explore. I started an experiment a year ago with Tizzy to see how I could add backtraces to Criterion, but so far the challenges are that it's pretty hard to from the runner process (I could do it from the worker process, but that would involve either doing that for assertions only, or install a signal handler for sigsegv, which unfortunately can be replaced by user code). Getting it right is hard, but I'm not giving up yet.

Tizzy looks neat! Ideally, we would be able to see a backtrace for any failing CR_ASSERT. This is a huge help in frameworks like py.test. Often we want to write helper functions to do test stuff. For example, a helper function to compare all of the elements in a container to the elements in a reference array. But without the stack trace if we get an error we only know that the error is in the helper function. We might call that helper function 10 times in a test and we won't easily know which one triggered the error. Glad to hear you are not giving up! Automatically printing out both the evaluated values used in an assertion failure and the backtrace to get to that assertion failure would be a killer feature. The new composable assertions means Criterion is half way there!

(5) the ability to use a single testrunner for all tests (maybe Criterion can do this too? we will see how that works in practice).

Not sure what you mean by that. Unless you explicitly compile different test source files into their own executable, then all the tests are going to be run by one test runner.

I think this is just because I was getting confused. Right now in our framework every foo.c and foo.h has a corresponding foo-test.c. The makefile compiles every foo-test.c into its own foo-test test runner. Now that I think about it we could of course just compile each foo-test.c into a foo-test.o and link them all together to create a single test runner. I wonder what most people do? The "many test runners approach" or the "single test runner" approach?

Regardless, one of your questions in the original post was about hierarchy. You mentioned that requiring a test suite complicates the API a bit. I definitely agree. I think the test tree approach based on the files and the directory hierarchy (i.e., what is used by py.test and novaprova) is perfect. It means you don't need to specify a test suite, but you still get an easy way to run subsets of tests. You could combine this with the ability to add "marks" (or "tags") to tests. py.test has a way to do this. This makes it easy to run subsets of tests with just a given tag (e.g., long running tests, performance tests). The filesystem approach + tags seems like it makes the common case very simple (i.e., I just want to write some tests without worrying about suites; I just want the way I mange tests to mimic the way I already manage the test source code via files and directories) but still enables more sophisticated test "hierarchy" (i.e., filtering) with marks/tags.

The more I think about it, the more I like this:

cr_param(n, int, 0, 1, 2);

cr_test void test_square(void) {
    cr_assert(eq(int, square(n), n * n));
}

Although wouldn't test_square take a single int as a parameter? Here is maybe some syntax:

CR_TEST_PARAM( int, n, (0, 1, 2) );
CR_TEST
void test_square( int n )
{
  CR_ASSERT_EQ( square(n), n*n );
}

So the idea is that you could add attributes to a test by declaring macros right before that test. Your idea of using line numbers to figure out which of those attributes goes with which test sounds like it might work very well. This is very much analogous to the way py.test uses function decorators to add attributes to tests. Other attributes like timeouts and tags would be similar:

CR_TEST_TAG( "foobar" );
CR_TEST_TIMEOUT( 10 );
CR_TEST_PARAM( int, n, (0, 1, 2) );
CR_TEST
void test_square( int n )
{
  CR_ASSERT_EQ( square(n), n*n );
}

Or expected signals:

CR_TEST_EXPECTED_SIGNAL( SIGABRT );
CR_TEST
void test_assert()
{
  assert( (1 == 2) && "this assert is supposed to happen" );
}

This is nice because the annotation system is composable. For example, you could easily add multiple tags. It makes the common case super simple because the common case would not have any annotations. But it still enables more sophisticated usage scenarios by adding annotations.

The new Criterion assertions will pretty print the values in an assertion, but even so Criterion might want to keep the cr_criterion_eq( a, b ); style macros as opposed to forcing users to use cr_criterion( eq( a, b ) ); I imagine the first can just use the latter so you would still get the benefit of printing the values of a and b? To a new user cr_criterion_eq( a, b ) is more intuitive than cr_criterion( eq( a, b ) ).

I'll have to think about it. You do have a point that it's a bit less confusing, but on the other hand it seems a bit redundant to have two ways to express the same thing. Plus, I think that confusion needs to be prevented (or at least aleviated) by having outstanding documentation -- and the docs are in need of improvement.

I agree that it is important to avoid redundancy. But it might be nice to have the simpler macros for the most common cases. Then a user could move to using the composable API for more complicated cases. I imagine CR_ASSERT_EQ is by far the most commonly used assertion macro. I also think the strongest frameworks are the ones that make the common case super simple, but then also have ways to still do more sophisticated stuff.

The integration with valgrind seems super useful -- in my course students are constantly writing buggy code with segfaults, memory leaks, etc. Having an automated way where common valgrind errors just turn into normal test errors is pretty slick. I wonder for Criterion if there might be a way to better integrate/document using Criterion with address sanitizer? I don't know enough about address sanitizer but it is on my list to experiment with it ... having a very clean way to catch these bugs as part of the standard testing infrastructure without having to have another tool (and another make target, etc) is pretty nice in my opinion.

I don't think that forcing memcheck to run has too many advantages, as you can still benefit from its diagnostics by running the test executable itself with valgrind --leak-check=full --trace-children=yes --error-exitcode=1, and if you force memcheck, you can't actually run other tools like callgrind. Plus, criterion still detects when it's run under valgrind and makes itself "friendlier" for diagnostics.

I guess my point is that we end up having to run all of our tests twice. Once without valgrind and then once with valgrind, but I 100% get your point about the problems with tightly coupling a test framework so that it absolutely must use valgrind.

As for address sanitizer, I'd say the support is even better. You just need to compile your test executable with -fsanitize=address, and you're done! Address sanitizer should make the test exit with a nonzero exit status on any kind of addressing error, which should fail the test. Bonus points, you can also benefit from -fsanitize=undefined and -fsanitize=thread to catch undefined behaviours and data races, and it's still natively supported by the runner since the test workers are aborted abnormally.

I think the take-away here is I need to look more seriously into address sanitizer! We might get exactly what we want with Criterion just by always compiling with address sanitizer enabled.

Using GDB in Criterion is a bit of a struggle, but I definitely understand your motivation for not having a --no-fork. However, students have to learn about inferiors, we have been having trouble setting a breakpoint once we are at a breakpoint in the test, and we are having trouble getting reverse debugging working ... but this is probably just because we are still figuring out best practices for using GDB with Criterion. Novaprova might have the same kind of issues.

I'm guessing you've seen the criterion.gdb script? It should help a little bit with multi-inferior debugging, but you still need to know what an inferior is and having a little bit of baggage with gdb helps, so I get that it might not be ideal. Plus, I don't think it worked on OS X, because getting gdb to work was a bit special.

Yes. We are using our own helper script which takes as input the testrunner binary and which test you want to debug. It then creates a GDB script similar in spirit to criterion.gdb but also puts a breakpoint at suitename_testname_impl and then includes the run and inferior 2 commands. This script definitely helps since it enables students to easily drop into GDB right in the test they want to debug.

I'd recommend remote debugging, or if you can't get that working, just pass --debug=idle and then attach to the PID. This should be fairly working on pretty much all platforms.

I started off trying to use/teach remote debugging but that got students pretty confused really quickly. One challenge is many of these students are brand new to even using the Linux command line. So on the one hand, I want to use a robust "real" unit testing framework, but on the other hand I want to keep things simple.

I will open another issue a little later with some of the challenges we have been facing using GDB and maybe someone can help us figure out what we are doing wrong.

Regardless, using Criterion is making a big difference. In the last programming assignment we were able to show students how they can use Criterion to verify their code causes an assertion when it supposed to (using .signal=SIGABRT) and we were also able to use timeouts. Both of these would have been very difficult with a DIY single-process/no-fork testing framework.

@Snaipe
Copy link
Owner Author

Snaipe commented Jan 3, 2020

So, it's been about 2 years since I've left this issue off. I've also been mostly programming in Go for the past 3 years, and I think that the Go test API is a perfect example of what I don't want Criterion to be (too much boilerplate and generally poor testing UX, which in turn does not encourage people to write tests). Still, there are some parts that are worth considering regarding "using the language" vs providing an implementation on the topic of parameterized tests.

Given the above discussion, here are the bits that I think stand-out:

  • Specifying test parameters in a macro is weird and outlandish. I think I would rather have a parameter-less Test() macro and provide a way to run parameterized subtests somehow.
  • Using plain old functions is the simplest API I can think of, I really like that, and I think I'm not alone.
  • Running parameterized tests can be done from within a test function, which means we could forget about automatically calling parameterized tests and simply have "top-level" tests take no parameters.

Narrowing that down to two proposals:

P1: Plain old functions:

cr_export void test_foobar(void)
{
    cr_assert(true);
}

static void some_subtest(int arg, int arg2)
{
    cr_assert(lt(int, arg, arg2));
}

cr_export void test_with_subtests(void)
{
    for (int i = 0; i < 10; i++) {
        // spawns in same process
        cr_run(some_subtest(i, i+1));
    }

    for (int i = 0; i < 10; i++) {
        // spawns each test in new process
        cr_run(some_subtest(i, i+1), .isolate = true, .timeout = 10);
    }
}

A nice consequence of having a cr_run macro means that we can also use it for the implementation of the default main.

P2: Parameter-less Test macro:

Test(foobar)
{
    cr_assert(true);
}

static void some_subtest(int arg, int arg2)
{
    cr_assert(lt(int, arg, arg2));
}

Test(with_subtests)
{
    for (int i = 0; i < 10; i++) {
        // spawns in same process
        cr_run(some_subtest(i, i+1));
    }

    for (int i = 0; i < 10; i++) {
        // spawns each test in new process
        cr_run(some_subtest(i, i+1), .isolate = true, .timeout = 10);
    }
}

In both proposals, ParameterizedTest and Theory are gone, and are replaced with a more explicit cr_run. I think I actually like that form better, as it becomes more consistent and feels less like magic (i.e. rather than defining some argument array somewhere or a generator or some other thingamajig).

You'll note that in both cases, cr_run takes in test properties rather than requiring that the subtest declares their own properties. I indeed think that the former makes more sense, because in that model Criterion isn't aware of the subtests themselves until the moment where cr_run is called.

This leaves the question of defining test properties for the top-level tests. We can probably do something like this:

P1

cr_export struct cr_testconfig testconfig_foobar = {
    .timeout = 10,
};

cr_export void test_foobar(void)
{
    cr_assert(true);
}

P2

Test(foobar, .timeout = 10)
{
    cr_assert(true);
}

Here P1 suffers the most as you need to have a weird definition above. We can somewhat make the problem simpler by introducing another macro:

cr_testconfig(foobar, .timeout = 10);

cr_export void test_foobar(void)
{
    cr_assert(true);
}

but it still feels weird that we have two separate statements that can be specified in any order.

Another possibility for P1 is to have the test itself set its own properties:

cr_export void test_foobar(void)
{
    cr_testconfig(.timeout = 10);

    cr_assert(true);
}

I think this one makes sense, but means that some properties can't be implemented here: .description, .init, .fini.

I don't think it's that big of a problem, since .description is cosmetic, and .init and .fini can be replaced with an appropriate function call before and after the test.

@ghost
Copy link

ghost commented Jan 3, 2020

FWIW I prefer functions (so P1) and the very last configuration option. What could be done for .init and .fini (not sure about desc) is something like this:

cr_export void test_foobar(void) FIXTURE(some_init, some_fini)
{
    cr_testconfig(.timeout = 10);
    cr_assert(true);
}

with FIXTURE doing the following:

  1. define unique identifier (I'll use foobar_id1234 in the example)
  2. provide test_foobar implementation as {some_init(); foobar_id1234(); some_fini();}
  3. use block that comes after fixture as implementation of foobar_id1234().

Perhaps fixture could be as simple as a single "keyword" with some assumed init/fini based on the function name but I'm not sure if __FUNCTION__ name inside the test_foobar can be used for that purpose.

@kugel-
Copy link

kugel- commented Jan 6, 2020

I actually like the Test() macros more, that makes it easier to discover top-level test routines within a source file that also uses static functions. Plain old functions work but opens up possibilities for doing this wrong (like forgetting cr_export).

But maybe we can keep our own Test() macro locally defined by a wrapper header.

I also like the specification of test parameters inside the Test() macro, but I'm not too attached to that specifically.

I do like the cr_run() idea. I appreciate the possibility to run ParemterizedTest in the same process (and still optionally with isolation) but what does cr_run() add to just run the test routine directly inside a loop? Does it still record each parameter set as an individual test?

@Snaipe
Copy link
Owner Author

Snaipe commented Jan 6, 2020

Plain old functions work but opens up possibilities for doing this wrong (like forgetting cr_export).

Note that the possibility of forgetting cr_export is only going to be a problem on Windows, because Windows never ever exports symbols for anything unless explicitly asked for. For Linux, BSDs and MacOS, it will work out of the box, regardless of putting cr_export on the function or not (and in fact, on these platforms cr_export will be empty).

I agree that this is a still a gotcha. I'm not sure I'm happy to envision people writing tests on Linux, only to realize later down the line that they don't run on Windows because they forgot some annotation.

what does cr_run() add to just run the test routine directly inside a loop? Does it still record each parameter set as an individual test?

Yes. It also prevents cr_assert failures in the inner test from aborting the whole function.

Another problem with cr_run is that I'll have to figure out a way to somehow distinguish two distinct runs in a loop. I'm not sure that printing each parameter is feasable, but perhaps using a printf-like string ought to do it:

struct cr_testconfig config = {
    .timeout = 10,
};
for (int i = 0; i < 10; ++i) {
    cr_run(some_subtest(i), &config, "subtest(%d)", i);
}

That means we can no longer use the in-call .key = value syntax, but I don't think we're losing too much here.

@kugel-
Copy link

kugel- commented Jan 6, 2020

FWIW, on Linux you can also use -fvisibility=hidden to not export anything by default. Lots of people do that and have probably adapted their build system to do that by default for any build artifact.

@Snaipe
Copy link
Owner Author

Snaipe commented Jan 6, 2020

The visibility knob only controls whether a symbol gets into the dynamic symbol list or not; it doesn't control the non-dynamic one, so it should be fine. Plus, it's still fairly rare to use -fvisibility on executables rather than shared objects.

This obviously means that if you strip your test executable, tests will no longer be detected. I think this is ok, but thinking back it might be better to experiment with only using the dynamic symbol list, assuming that GCC lets me do that for both PIEs and non-PIEs.

@MrAnno
Copy link
Collaborator

MrAnno commented Nov 23, 2021

I'd also prefer the P2 proposal.

For me, this is what makes Criterion Criterion: the simplicity, the elegance, no boilerplates.

Test(foobar, .timeout = 10, ...)
{
    cr_assert(true);
}

I know about the nasty tricks in their implementation and how painful macros can be, but from the user's perspective, the above example is just purely beautiful in my opinion.

It's not a valid C/C++ syntax, but it is clean as it does not require "annotations" like cr_export, nor a separate config structure that would be connected to the test based on its name testconfig_foobar. cr_testconfig() looks a bit better, but I think it's far from the original Criterion elegance.

I really like cr_run() as a ParameterizedTest replacement, it would be much easier to use (especially in C, ParameterizedTest needed some boilerplate with cr_make_param_array(), and the cr_malloc() trickery for non-trivial use-cases).

Repository owner locked and limited conversation to collaborators Jan 3, 2022
@Snaipe Snaipe converted this issue into discussion #412 Jan 3, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

8 participants