-
Notifications
You must be signed in to change notification settings - Fork 184
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future directions for the test API #204
Comments
Considering how difficult C can be, i think it would be difficult to get rid of macros. It would be though interesting to have possibility to define Suite as structure which could be used as context for running tests(similarly to have Google test uses its fixtures).
I'm not really sure it would easy to use as they serve quite different purposes. |
I think that's a fair point, but then I also find having the parameter list in the test macro (i.e. the On the other hand, as you implied, this means that we lock ourselves in an ABI contract. This discussion will hopefully shine some light on whether this is a bad thing for us considering that I will feature-lock criterion 3.0. One thing I dislike with the symbol proposal is that I feel that other symbols (setup_, teardown_, properties_, ...) are pretty big gotchas and might not be that convenient once you want to reuse some of the functions. Perhaps only
That's not really a problem in my opinion once you realize that both are pretty much doing the same thing: calling a function with parameters. |
Something like setup/teardown would definitely better to go as property of Suite structure (if Criterion would go for explicit suite definition). Btw how easily symbol look-up can be implemented?
Hm... well i suppose i may over-think it. |
It's pretty easy. In fact, I've already been doing it for BoxFort and Mimick, so I can probably refactor that logic in a separate library and reuse it in the three projects. |
If you are going to merge Theories with Parametrised Tests there should still be a way (maybe a parameter) to generate all parameter combinations. struct stream_state {
void *state; /* Steam state data. It will be up to the users to choose what they do with it. */
void *head_value; /* The head value of the stream. */
};
/* A stream generator. This will be up to the user to implement.
* You could use the current->state to make this function reentrant, or it could be ignored
* (tt's up to the user how they want it to be)
*/
struct stream_state *next_state(struct stream_state current); |
This can probably be done through the proposal to support generators. For instance, to generate the cr_param_list generator_square(void) {
static int i = 0;
if (i > 2) {
return NULL;
}
return cr_alloc_params((int, i++));
} (which pretty much is a more atomic version of the current ParameterizedTest interface) And for random number generation (e.g., generating 10 random ints): cr_param_list random_int_generator(void) {
static int i = 0;
if (i++ > 10) {
return NULL;
}
return cr_alloc_params((int, random_int()));
} However I have a few gripes with this. I don't like having to do memory allocation, type safety is missing, and the control flow of a regular function isn't really convenient for a generator.
I'd probably like the generation to be somewhat typesafe. The current I may have a proposal to address all these: struct context {
int32_t seed;
};
int32_t random_int32(int32_t *seed);
// declare a generator for the foo Test
Generator(foo, (i32 bar)) {
cr_gen_enter(struct context, {
.seed = ...
});
// generate 10 random numbers
for (size_t i = 0; i < 10; ++i) {
cr_gen_yield(random_int32(&seed));
}
cr_gen_leave();
}
Test(foo, (int32_t bar)) { ... } Basically, By the way, I think that criterion could also provide a simple implementation of a mersenne twister for convenience since |
Starting from your proposal, I would like to change generators to make them able to create infinite streams. /* Generator receives the types of the state and generated value.
* It has 3 pre-defined variables:
* state - const pointer to the current state
* next_state - location to store next state (generator should fill this)
* value - location to store generated value (generator should fill this)
* In the case below, state type = int32_t, value type = char.
*/
Generator((int32_t, char), char_generator) {
int32_t number = random_int32(state);
*next_state = number;
*value = number % 0xFF;
}
int32_t seed1 = 42, seed2 = 7, seed3 = 13;
/* Similar to a theory, but instead of providing theory data points for c1, c2, c3,
* you give them generators + seeds.
* The last parameter is the number of iterations.
*/
StreamTheory((char_generator &seed1 c1, char_generator &seed2 c2, char_generator &seed3 c3),
suite_name, test_name, 1232445) {
test_something_with(c1, c2, c3);
/* Please note that nowhere, do we need to explicitly call char_generator.
* It will be up to Criterion to call char_generator for each iteration.
* In the first iteration, the generators will receive seed1, seed2 & seed3, but
* in the next iterations they will receive the states generated at the previous step.
* Passing the state as function parameters ensures that generating c1 will not affect
* generating c2 and c3.
*/
}
Don't know about rand()'s RNG qualities, but if it's bad, sure! Having this + stream generators will make the tool great for testing code against bad formatted input.. |
Note that the proposal I made defined a generator to be a coroutine, so an infinite stream can simply be represented as follows: Generator(foo, (i32 bar)) {
cr_gen_enter(void);
// infinite generator of 1
while (true) {
cr_gen_yield(1);
}
cr_gen_leave();
} The idea of making generators reusable is interesting. If that's the case, maybe the best course of action would be to embrace the fact that it's a function and get rid of the Generator macro? out_type foo_generator(cr_gen_ctx ctx, state_type *state) {
cr_gen_enter(ctx);
// do something with the state, yield some values, ...
cr_gen_leave(ctx);
}
If the generators can be specified orthogonally from each others, this raises a few questions that are going to be hard to solve. For instance, what happens if you pass an infinite generator for c1, but a finite generator for c2? what happens if the iteration number is greater than the number of times the finite generator can be called? Also the fact that you base your example as an extension of the Theory semantics makes the orthogonality of generators difficult. I don't even think we can have generators at all for Theories, because by nature Theories are called with the n-dimensional matrix products of all of its possible parameters. For instance: If the possibles values of c1 are (1, 2); c2 (3, 4); and c3 (5, 6), then the theory will be called with the following parameters: (1, 3, 5), (1, 3, 6), (1, 4, 5), (1, 4, 6), (2, 3, 5), (2, 3, 6), (2, 4, 5), (2, 4, 6). More generally, for N parameters with M values each, you have M^N possible combinations. You see what kind of problem we're going to have for generators: for each new value produced by a generator, we'll have to test it against every previous values ever produced, which becomes impractical for very large values as we have to store things as we go and we can only guess the upper bound; and while you may be able to be warned for ressource exhaustion at compile time with a static array, this is not possible with a generator. I realize that this can be solved by doing some careful dynamic allocation, but it's hard to make this fast, and it might blow up in the developer's face when not used with care. Plus, Theories really aren't made for random testing, but careful testing with hand-picked "interesting" values (or ranges of values, which should definitely be something to support in the future). Testing against random values isn't theory testing, but fuzzy testing, and should be used in parallel to theories. On another note, maybe we're taking an approach to the parameterization problem that is too complicated. How about not parameterizing tests, but introducing subtests: Test(foo) {
int32_t state = 42;
Subtest("testing against 10 random values", .repeat = 10) {
int32_t val = random_int32(&state);
cr_assert(eq(do_thing(val), expected_val)); // failing the assert fails the subtest, but the test continues
}
} However, I have two issues with this: The first issue I see there is that this syntax doesn't let us define nested functions (and while it might be accepted as a GNU extension, MSVC doesn't allow it) so we need to put this in a for loop. This means that we can't really spawn threads ourselves to parallelize this, but this can be worked around by the usage of OpenMP pragmas, which all of the compilers we target support. The second issue is that if we parallelize this, then we'll be throwing multithreading into subtests, which might confuse developers (since the function they call might be racy) unless they explicitely want it. We could add a |
I like this approach. It would be nice to be able do define the Test - TestSuite relation like this:
I'm afraid that's not possible in C though.
We should do this do simplify the API. But I don't like the signature ( |
I was thinking about cycling through that finite amount of data points so that c2 acts as an infinite generator. On an abstract level, all that these infinite generators need to do is take a state and return a pair of state and value (
You are right, it was a mistake to associate them with theories.
Yes, fuzzing is what I really want to do with generators. Also the fact that generators are independent and they pass the states as arguments makes them perfect to track the cause of errors and also make it possible to resume from a known state, after fixing a bug (this is mostly useful when the generators aren't random, but instead they simulate some kind of communication protocol that has states). Maybe a good approach is to make a Fuzzing function, different from the Theory function. |
Right, it's not. However, we can have subtests that follow the same hierarchy (as mentionned a few posts up).
I was about to propose something like this: Test(foo) {
}
Test(bar, params(i32 i, u64 u)) {
} But I realize that this is also impossible to do as the macro concatenation of some identifier with @mirror3000
Interesting, I was thinking of stopping altogether with a finite generator finished, but this is a sensible option too. This also means that we effectively have to store all of the previous values, which becomes a problem if the generated set is large (and it usually is considering generators are precisely used when the context may become too large). I'm still not sure whether having separate generators is simpler to implement and use though. One could still use one generator for a particular test, that calls itself sub-generators (like a PRNG) in the manner they want. This might let users have more control over what values go through the generator, as some of them will definitely use them for values that aren't random.
That's an interesting idea. I think this could be nice to have especially with the feature request described in #202.
I wonder if introducing yet another macro is a good idea on the long term. I've been thinking a bit about how theories & parameterized tests are being implemented, and I definitely think that we could leverage some of the techniques I used in Mimick to run special tests like theories (or even fuzzing if we do implement this). Maybe something like the following: int div_ints(int a, int b) {
return a / b;
}
CR_CONTRACT_PREPARE(div_ints);
bool nonzero(int val) {
return !!val;
}
Test(theory) {
// create a contract over div_ints, called with any int as a first parameter, and any nonzero int as a second parameter
cr_contract c = cr_contract_from(div_ints(cr_any(int), cr_that(int, nonzero)));
// check for the theory using the contract with the specified datapoints
int int_dp[] = {INT_MIN, -1, 0, 1, INT_MAX};
bool ok = cr_theory_check(c, .datapoints = {int_dp, int_dp});
// assert that the check passed
cr_assert(ok, "checking theory for div_ints");
}
Test(fuzz) {
// create a contract over div_ints, called with any int as a first parameter, and any nonzero int as a second parameter
cr_contract c = cr_contract_from(div_ints(cr_any(int), cr_that(int, nonzero)));
// fuzz the contract over 100 iterations
bool ok = cr_fuzz(c, 100);
// assert that the check passed
cr_assert(ok, "checking fuzzing for div_ints");
} |
An extra +1 if |
Another point that was brought up today on IRC: setup & teardown fixtures could take the same parameters as the test itself, so you could have reusable fixtures: static int file;
void openfile(const char *path)
{
file = open(path, O_RDONLY, 0);
}
void closefile(const char *path)
{
(void) path; // unused
close(fd);
}
Test(foo, (str path), .init = openfile, .fini = closefile)
{
// test file
} Though I'm not sure how that would work for suite-specific fixtures. I'll have to think more about this. |
Another possible approach for test suite structuring: Instead of defining the hierarchy through macros/properties, simply use the filesystem. That is, for the following tree:
We would have two root test suites, Now, this would obviously force users to split their tests in multiple files if they initially had all their suites in one source file, but it has a few advantages:
Thoughts on that? |
While I do find it a pretty simple, I don't really like how you'd force user's hand here. I'm not strongly opposed to it though, just my thoughts on forcing user's hand. |
We are using Criterion in a sophomore-level course on computer systems programming using C/C++ at Cornell University and we really like it. More info here: https://web.csl.cornell.edu/courses/ece2400 Also glad to see that the framework is continuing to evolve and improve. Personally, I would love if Criterion moved to the "symbol-lookup" proposal mentioned above which uses standard functions to define tests. The Test macros are pretty confusing to new users. I also really like the parameterized with static list example in the "symbol-lookup" proposal. We use py.test for all of our Python unit testing, and the "symbol-lookup" idea is very much in that direction. |
Great to hear your inputs on the matter. I am a bit puzzled by a few consequences of the symbol lookup proposal, and I feel we're trading off safety for simplicity, especially with parameterized tests. For instance, consider the following setup: // parameterized w/ static list
cr_param_list param_list_square_alt[] = {
cr_define_params((double, 3.14)),
};
void test_square_alt(int n) {
// undefined behaviour
} This, while a horrible idea, would compile without warning or errors because we have no way to enforce type safety on two seemingly-unrelated symbols (the parameter list, and the function). Now, we could still somewhat achieve that by inspecting debugging information and panic when the types mismatch, but this would break the expectation that Criterion would compile without debugging symbols. Plus, as I mentionned earlier, we'd have to stick a What we could probably do in this case is go with something similar to novaprova: cr_param(n, int, 0, 1, 2);
cr_test void test_square(void) {
cr_assert(eq(int, square(n), n * n));
} Though, I'm not sure how we would go about detecting which function a parameter is for. Maybe use the line number through |
Hmmm ... I guess I will have to give it some more thought. Based on your post I took a closer look at novaprova. It seems like both Criterion and novaprova are modern frameworks that enable best practices for test-driven design in C and C++ ... I would be interested to know your thoughts on comparing the two frameworks? Criterion is more portable (novaprova only works on Linux) and seems to be more actively developed ... is there any other important differences between the two frameworks? |
I can't speak for its author, but I think I can summarize some of the differences pretty well: Novaprova takes the stance that to achieve perfect isolation and allow for in-depth instrumentation of error conditions on tests, the program has to run under valgrind. It does this by re-exec'ing itself with memcheck, and using various hooks to catch and report various error conditions, including memory leaks, addressing errors, and other crashes. An added bonus, which is huge and was something we didn't even support until While I find that method brilliant in itself, I do not agree with it. Because it runs under Valgrind/memcheck, novaprova can only work on platforms that run them, and worse, the test code is never going to run on the hardware, as valgrind is essentially a virtual machine, which means that you're going to have behavioural differences on some edge cases. Finally, code running under valgrind is orders of magnitude slower than it normally is. With Criterion, I took the stance of sandboxing the test code into a separate process to monitor it from another process. In this case, the code can run wherever the user wants -- be it on the host processor, or a qemu process. I also didn't want to include any fluff that would lock users into using a specific technology -- which is why there is no memory leak check facilities, coverage tools, or other instrumentation integrated by default with criterion. Instead, users can choose whatever they want, and run it however they want. I think this was a good decision in hindsight, given the rising popularity of address-sanitizer over memcheck. Other (though older) frameworks also offered tests to run in a sandboxed process (though usually only on systems supporting fork()), but made it optional, which is I think a terrible decision and why I've been vehemently trying to prevent the addition of a Aside from that, Criterion currently follows a rigid test structure, while novaprova is a bit more modular. This is kind of funny now, since the criterion runner does support custom test trees, but the C API doesn't let us do that -- which is precisely the reason why I opened this issue with a bunch of proposals to address this. I'm sure there are other differences, but I haven't dug enough in novaprova to be able to accurately compare them. |
This is a really great comparison! It might be interesting to see if it is possible to bring some of the nicer features from Novaprova into Criterion. We are going to play around with both in my course and see what we can learn. From just a very cursory look at Novaprova some things that seem nice to me are: (1) tests are just functions with automatic test discovery (the common case of non-parameterized tests is very clean with absolutely no boiler plate just like in py.test, not even a cr_test in front), (2) coding convention of using all caps for macros (e.g., NP_ASSERT_EQUAL) -- seems minor but it helps new users understand what is a macro and what is a function, (3) pretty printing values that are used in an assertion (super useful), (4) getting a stack trace on an assertion error (super, super useful!), (4) the filesystem/tree-based test organization (super similar to py.test, really nice), and (5) the ability to use a single testrunner for all tests (maybe Criterion can do this too? we will see how that works in practice). The new Criterion assertions will pretty print the values in an assertion, but even so Criterion might want to keep the The integration with valgrind seems super useful -- in my course students are constantly writing buggy code with segfaults, memory leaks, etc. Having an automated way where common valgrind errors just turn into normal test errors is pretty slick. I wonder for Criterion if there might be a way to better integrate/document using Criterion with address sanitizer? I don't know enough about address sanitizer but it is on my list to experiment with it ... having a very clean way to catch these bugs as part of the standard testing infrastructure without having to have another tool (and another make target, etc) is pretty nice in my opinion. Using GDB in Criterion is a bit of a struggle, but I definitely understand your motivation for not having a --no-fork. However, students have to learn about inferiors, we have been having trouble setting a breakpoint once we are at a breakpoint in the test, and we are having trouble getting reverse debugging working ... but this is probably just because we are still figuring out best practices for using GDB with Criterion. Novaprova might have the same kind of issues. The portability issue of novaprova might be a show stopper since at least myself and many of my TAs use Mac OS X for development ... Regardless, it is really exciting to see such modern and sophisticated testing frameworks for C. The last time I looked around for C unit testing frameworks was 15 years ago and the options were pretty dismal. When I started preparing for teaching this new course, I was just really pleased to find a framework like Criterion that we could use -- so thank you for your hard work on this great project! |
That's great! While I don't agree with the vision of Novaprova, I still think it's a step in the right direction compared to Check/CUnit, and I definitely won't claim that my own design is better, or more "right" -- so please use what you think is best for your students.
Note that in this case cr_test would "technically" be optional on platforms other that Windows. The reason we need to stuff in a dllexport is that MSVC doesn't ever export any symbol (it doesn't even populate the COFF symbol table), so we have to rely on exported DLL symbols.
Ah, yes, I can see why that would be an issue. The original reason why cr_assert is lowercased is because Criterion used to re-define assert(), which turned out to be one of the worst ideas I've ever had, and when I renamed the macro, the matching convention sticked. I think that it being somewhat similar to Criterion also doesn't have the best of consistencies, as this was started as a spare-time school project, and my own conventions shifted around a year into development. This is mostly why I've been refactoring a lot of the internals and tried correcting API design mistakes in the past few months.
This is something I've been meaning to explore. I started an experiment a year ago with Tizzy to see how I could add backtraces to Criterion, but so far the challenges are that it's pretty hard to from the runner process (I could do it from the worker process, but that would involve either doing that for assertions only, or install a signal handler for sigsegv, which unfortunately can be replaced by user code). Getting it right is hard, but I'm not giving up yet.
Not sure what you mean by that. Unless you explicitely compile different test source files into their own executable, then all the tests are going to be run by one test runner.
I'll have to think about it. You do have a point that it's a bit less confusing, but on the other hand it seems a bit redundant to have two ways to express the same thing. Plus, I think that confusion needs to be prevented (or at least aleviated) by having outstanding documentation -- and the docs are in need of improvement.
I don't think that forcing memcheck to run has too many advantages, as you can still benefit from its diagnostics by running the test executable itself with As for address sanitizer, I'd say the support is even better. You just need to compile your test executable with
I'm guessing you've seen the I'd recommend remote debugging, or if you can't get that working, just pass |
Ah, interesting. That means since we only use Linux and Mac OS X we could probably get away without the
Maybe using all caps for macros might be something to consider for the new API then? Especially with the new composable assertions, I think using EQ, LT, etc would be important to (1) avoid namespace collisions with small helper functions, and (2) to emphasize that these are macros. Just a thought ... it might also help improve consistency and better match many other coding conventions to use all caps for macros and lowercase with underscores for functions.
Tizzy looks neat! Ideally, we would be able to see a backtrace for any failing
I think this is just because I was getting confused. Right now in our framework every Regardless, one of your questions in the original post was about hierarchy. You mentioned that requiring a test suite complicates the API a bit. I definitely agree. I think the test tree approach based on the files and the directory hierarchy (i.e., what is used by The more I think about it, the more I like this:
Although wouldn't test_square take a single
So the idea is that you could add attributes to a test by declaring macros right before that test. Your idea of using line numbers to figure out which of those attributes goes with which test sounds like it might work very well. This is very much analogous to the way
Or expected signals:
This is nice because the annotation system is composable. For example, you could easily add multiple tags. It makes the common case super simple because the common case would not have any annotations. But it still enables more sophisticated usage scenarios by adding annotations.
I agree that it is important to avoid redundancy. But it might be nice to have the simpler macros for the most common cases. Then a user could move to using the composable API for more complicated cases. I imagine
I guess my point is that we end up having to run all of our tests twice. Once without valgrind and then once with valgrind, but I 100% get your point about the problems with tightly coupling a test framework so that it absolutely must use valgrind.
I think the take-away here is I need to look more seriously into address sanitizer! We might get exactly what we want with Criterion just by always compiling with address sanitizer enabled.
Yes. We are using our own helper script which takes as input the testrunner binary and which test you want to debug. It then creates a GDB script similar in spirit to criterion.gdb but also puts a breakpoint at
I started off trying to use/teach remote debugging but that got students pretty confused really quickly. One challenge is many of these students are brand new to even using the Linux command line. So on the one hand, I want to use a robust "real" unit testing framework, but on the other hand I want to keep things simple. I will open another issue a little later with some of the challenges we have been facing using GDB and maybe someone can help us figure out what we are doing wrong. Regardless, using Criterion is making a big difference. In the last programming assignment we were able to show students how they can use Criterion to verify their code causes an assertion when it supposed to (using |
So, it's been about 2 years since I've left this issue off. I've also been mostly programming in Go for the past 3 years, and I think that the Go test API is a perfect example of what I don't want Criterion to be (too much boilerplate and generally poor testing UX, which in turn does not encourage people to write tests). Still, there are some parts that are worth considering regarding "using the language" vs providing an implementation on the topic of parameterized tests. Given the above discussion, here are the bits that I think stand-out:
Narrowing that down to two proposals: P1: Plain old functions:cr_export void test_foobar(void)
{
cr_assert(true);
}
static void some_subtest(int arg, int arg2)
{
cr_assert(lt(int, arg, arg2));
}
cr_export void test_with_subtests(void)
{
for (int i = 0; i < 10; i++) {
// spawns in same process
cr_run(some_subtest(i, i+1));
}
for (int i = 0; i < 10; i++) {
// spawns each test in new process
cr_run(some_subtest(i, i+1), .isolate = true, .timeout = 10);
}
} A nice consequence of having a P2: Parameter-less Test macro:Test(foobar)
{
cr_assert(true);
}
static void some_subtest(int arg, int arg2)
{
cr_assert(lt(int, arg, arg2));
}
Test(with_subtests)
{
for (int i = 0; i < 10; i++) {
// spawns in same process
cr_run(some_subtest(i, i+1));
}
for (int i = 0; i < 10; i++) {
// spawns each test in new process
cr_run(some_subtest(i, i+1), .isolate = true, .timeout = 10);
}
} In both proposals, ParameterizedTest and Theory are gone, and are replaced with a more explicit You'll note that in both cases, cr_run takes in test properties rather than requiring that the subtest declares their own properties. I indeed think that the former makes more sense, because in that model Criterion isn't aware of the subtests themselves until the moment where cr_run is called. This leaves the question of defining test properties for the top-level tests. We can probably do something like this: P1cr_export struct cr_testconfig testconfig_foobar = {
.timeout = 10,
};
cr_export void test_foobar(void)
{
cr_assert(true);
} P2Test(foobar, .timeout = 10)
{
cr_assert(true);
} Here P1 suffers the most as you need to have a weird definition above. We can somewhat make the problem simpler by introducing another macro: cr_testconfig(foobar, .timeout = 10);
cr_export void test_foobar(void)
{
cr_assert(true);
} but it still feels weird that we have two separate statements that can be specified in any order. Another possibility for P1 is to have the test itself set its own properties: cr_export void test_foobar(void)
{
cr_testconfig(.timeout = 10);
cr_assert(true);
} I think this one makes sense, but means that some properties can't be implemented here: .description, .init, .fini. I don't think it's that big of a problem, since .description is cosmetic, and .init and .fini can be replaced with an appropriate function call before and after the test. |
FWIW I prefer functions (so P1) and the very last configuration option. What could be done for .init and .fini (not sure about desc) is something like this:
with FIXTURE doing the following:
Perhaps fixture could be as simple as a single "keyword" with some assumed init/fini based on the function name but I'm not sure if |
I actually like the Test() macros more, that makes it easier to discover top-level test routines within a source file that also uses static functions. Plain old functions work but opens up possibilities for doing this wrong (like forgetting cr_export). But maybe we can keep our own Test() macro locally defined by a wrapper header. I also like the specification of test parameters inside the Test() macro, but I'm not too attached to that specifically. I do like the cr_run() idea. I appreciate the possibility to run ParemterizedTest in the same process (and still optionally with isolation) but what does cr_run() add to just run the test routine directly inside a loop? Does it still record each parameter set as an individual test? |
Note that the possibility of forgetting cr_export is only going to be a problem on Windows, because Windows never ever exports symbols for anything unless explicitly asked for. For Linux, BSDs and MacOS, it will work out of the box, regardless of putting cr_export on the function or not (and in fact, on these platforms cr_export will be empty). I agree that this is a still a gotcha. I'm not sure I'm happy to envision people writing tests on Linux, only to realize later down the line that they don't run on Windows because they forgot some annotation.
Yes. It also prevents cr_assert failures in the inner test from aborting the whole function. Another problem with cr_run is that I'll have to figure out a way to somehow distinguish two distinct runs in a loop. I'm not sure that printing each parameter is feasable, but perhaps using a printf-like string ought to do it: struct cr_testconfig config = {
.timeout = 10,
};
for (int i = 0; i < 10; ++i) {
cr_run(some_subtest(i), &config, "subtest(%d)", i);
} That means we can no longer use the in-call |
FWIW, on Linux you can also use -fvisibility=hidden to not export anything by default. Lots of people do that and have probably adapted their build system to do that by default for any build artifact. |
The visibility knob only controls whether a symbol gets into the dynamic symbol list or not; it doesn't control the non-dynamic one, so it should be fine. Plus, it's still fairly rare to use -fvisibility on executables rather than shared objects. This obviously means that if you strip your test executable, tests will no longer be detected. I think this is ok, but thinking back it might be better to experiment with only using the dynamic symbol list, assuming that GCC lets me do that for both PIEs and non-PIEs. |
I'd also prefer the P2 proposal. For me, this is what makes Criterion Criterion: the simplicity, the elegance, no boilerplates.
I know about the nasty tricks in their implementation and how painful macros can be, but from the user's perspective, the above example is just purely beautiful in my opinion. It's not a valid C/C++ syntax, but it is clean as it does not require "annotations" like I really like |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I am a bit dissatisfied by the current state of the test API. When I started this project, I came from a Java/C++ background where we had annotation-driven (JUnit) & macro-driven (GTest) test frameworks, and was pretty frustrated by how painful to use C frameworks were. This was two and a half years ago, and since then I've seen the design mistakes that I made limit and impact the project in (mostly) negative ways.
I think it's time to address that. I've talked about it a bit in #168, but I think that we need to address a few problems with the current design of the test API
Consistency
We currently have three flavour of tests: regular, parameterized, and theories. It doesn't take much to see that all three are pretty much the same; the only thing that differs is the way parameters are generated. The
Test
,ParameterizedTest
, andTheory
macros pretty much roll their own semantics at the moment, and I really hate this.Hierarchy
The current API defines a two-level hierarchy for tests: on top, test suites, and right below, tests. Criterion defines
TestSuite
as a mean to define test suites properties, although this can feel quite hack-ish. I always told myself that this hierarchy was good enough for anyone, but now, I think that this might be complexifying the API by always requiring a test suite. Thoughts? It might be better to optionally provide a parent in the test definition.Compiler-dependency
This one is a bit minor compared to the consistency problem, but as I mentionned in #184, using sections to register tests makes the API compiler-dependent. While nowadays this is less of a problem with the market-share dominance of GCC, LLVM, and MSVC, this is something I'd like to push out of the API and into the implementation.
Proposals
Macro-based
This doesn't change much from the current API, but that doesn't mean we can't improve the semantics of the definition macros.
Turning parents into properties
We could turn the parent of a test into a property: instead of doing:
We could have:
However, this would make the usage of TestSuite to declare a suite mandatory, while its definition is currently inferred by the runner in 2.3.x.
Merging ParameterizedTest with Theory
The semantics of
ParameterizedTest
andTheory
could be merged into one parameterized test macro. The usage of such a macro could be:ParameterizedTest(name, (args), properties...)
, e.g:Unify all the definition macros
We could additionaly go one step further and just make Test follow the above semantics. Regular tests would be declared with
(void)
:Symbol-lookup
I believe this approach is already done by NovaProva. Basically, instead of relying on a macro to define our test as if it was something special, we just define a function with a specific name:
This makes the API extremely simple; tests are just functions, and you can define other symbols to alter slightly the calling semantics of the test.
As for additional test properties, we have a few ways:
This approach, however, has a drawback: it makes it impossible to do some runtime initialization or exception handling for non-C languages (while declaring through a macro lets us do that since we can wrap the user function in a try/catch for instance); or rather than "impossible", it moves the responsibility back to the runner, so we have to compile some C++ on our side.
Another issue is that in practice, because we need to support C++ and Windows, we'd need to add a
cr_register
attribute to all of those symbols, which would expand to__declspec(dllexport)
and/orextern "C"
(the former is needed to actually find the symbol on windows, and the latter is needed to prevent mangling on the symbol).Discussion
I'd really like some inputs on the matter. Preferably, the final choice will have to compromise between a few issues, but I'd really like to improve simplicity and consistency without shaving too much features off.
Finally, if anyone has other proposals, I'm all ears. There's no rush on the matter since we have some heavy refactor to do on the internals before changing the API, but we need to start thinking about it now.
The text was updated successfully, but these errors were encountered: