Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Testing] Introduce proptest property testing library and add type alises for hashbrown types #4477

Draft
wants to merge 1 commit into
base: next
Choose a base branch
from

Conversation

cylewitruk
Copy link
Member

@cylewitruk cylewitruk commented Mar 4, 2024

This PR introduces the proptest crate with some example property testing tests and wrapper types in stacks-common for hashbrown's HashMap and HashSet types.

Note: This is a break-out from the larger draft PR #4437.

I have provided a number of proptest strategies for common Clarity types, including Contract, TypeSignature, SymbolicExpression and Value (some courtesy of @Acaccia from clarity-wasm).

The hashbrown wrapper types are needed as we cannot implement external crates for external types and I have provided proptest strategies for these new types as well. To use the new types, simply replace std::collections::HashMap/Set or hashbrown::HashMap/Set with clarity_common::types::StacksHashMap/Set as HashMap/Set
This caused problems in the signerlib which uses a combination of hashbrown and std, so given that this isn't strictly needed for this PR I have removed these new concrete types in favor of type aliases.

Note that I have gone through and replaced use::hashbrown::xxx with the above, hence the large number of touched files.

I have also added additional tests for ClarityDatabase which 1) were missing, and 2) demonstrate the usage of proptest.

Resolves Issues

@cylewitruk cylewitruk added optimization Update speed, efficiency, or quality of a feature. testing and removed optimization Update speed, efficiency, or quality of a feature. labels Mar 4, 2024
@jcnelson jcnelson self-requested a review March 4, 2024 15:47
Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to adding a way to create mocked data structures, but I think this is fundamentally the wrong way to do it.

First, this should not be a dependency at all. It's purely for testing; it should be only a dev-dependency.

Second, it doesn't look like fake is actually doing that much work for us. You're still creating all these constructors for all these data structures, and populating them with Rng-synthesized data. Why have fake at all then?

Third, every data type in Clarity has a bound size by design. But, none of these constructors exercise it. For example, it would be very useful to be able to do things like construct data types of maximum size, or maximum depth, in order to e.g. benchmark the code or iron out resource usage overflows, but that doesn't appear to be something fake does for us.

I'm not convinced that this PR is necessary.

@cylewitruk
Copy link
Member Author

cylewitruk commented Mar 4, 2024

First, this should not be a dependency at all. It's purely for testing; it should be only a dev-dependency.

As I wrote in the description, this needs to be included as an optional dependency because of cargo's limitation that it doesn't propagate the test profile to dependencies, even in the same workspace, so #[cfg(test)] in stacks-common will not be visible to clarity or stackslib etc. when built in the test profile -- so that's why it's been conditionalized on the testing feature. This is the recommended workaround according to here and here, and as the testing feature already exists in most crates it seems that this has been a problem previously.

Second, it doesn't look like fake is actually doing that much work for us. You're still creating all these constructors for all these data structures, and populating them with Rng-synthesized data. Why have fake at all then?

I've provided specific implementations to manage recursion and to handle types which require that i.e. two separate Vecs which need to have the same len(). Without this, the recursion is unpredictable and can sometimes cause the test thread to go OOM. Fake provides a lot of boilerplate that frankly would be a waste of time to re-implement for for this project "just because" -- I encourage you to check out their examples.

Third, every data type in Clarity has a bound size by design. But, none of these constructors exercise it. For example, it would be very useful to be able to do things like construct data types of maximum size, or maximum depth, in order to e.g. benchmark the code or iron out resource usage overflows, but that doesn't appear to be something fake does for us.

With a bit more effort this could easily be implemented, very similar to their built-in "locales" support. This isn't a use-case for the moment with basic testing that instances can be serialized, inserted in the database, read from the database, deserialized and be equal.

I'd just like to add that this is intended to be a "first introduction" of the crate -- while this happens to be rather complex and recursive types, I don't think there's too many other types in the codebase which have the same recursive nature, so for the majority of remaining types the simple #[cfg_attr(feature = "testing", derive(::fake::Dummy))] should be enough. If specific constraints apply to fields, they can be attributed individually.

Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 81.11111% with 153 lines in your changes are missing coverage. Please review.

Project coverage is 83.20%. Comparing base (cab1a11) to head (a53aeed).

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #4477      +/-   ##
==========================================
- Coverage   83.26%   83.20%   -0.07%     
==========================================
  Files         451      464      +13     
  Lines      325765   326533     +768     
  Branches      323      323              
==========================================
+ Hits       271240   271678     +438     
- Misses      54517    54847     +330     
  Partials        8        8              
Files Coverage Δ
clarity/src/libclarity.rs 39.28% <ø> (ø)
clarity/src/proptesting/callables.rs 100.00% <100.00%> (ø)
clarity/src/proptesting/mod.rs 100.00% <100.00%> (ø)
clarity/src/proptesting/representations.rs 100.00% <100.00%> (ø)
clarity/src/proptesting/types.rs 100.00% <100.00%> (ø)
clarity/src/vm/analysis/arithmetic_checker/mod.rs 92.75% <ø> (ø)
clarity/src/vm/analysis/read_only_checker/mod.rs 87.16% <ø> (ø)
clarity/src/vm/analysis/trait_checker/mod.rs 100.00% <ø> (ø)
clarity/src/vm/analysis/type_checker/contexts.rs 93.50% <ø> (ø)
...ity/src/vm/analysis/type_checker/v2_05/contexts.rs 98.25% <ø> (ø)
... and 53 more

... and 30 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cab1a11...a53aeed. Read the comment docs.

@cylewitruk cylewitruk force-pushed the testing/introduce-fake branch from 0dcae6d to f458050 Compare March 4, 2024 17:09
@jcnelson
Copy link
Member

jcnelson commented Mar 4, 2024

You deleted this, but I think it's a valid question:

And the fundamentally correct way would be...?

If I were doing this, I would just create a trait called Mocked, similar to Default, which provided the function fn mocked() -> Self, which when called would instantiate the struct with random data from the domain of possible inputs. Then, you could have things like ClarityName::mocked() which evaluate to a ClarityName instance with plausible but synthesized data.

I've provided specific implementations to manage recursion and to handle types which require that i.e. two separate Vecs which need to have the same len(). Without this, the recursion is unpredictable and can sometimes cause the test thread to go OOM.

Right -- you're doing this with our without fake.

Fake provides a lot of boilerplate that frankly would be a waste of time to re-implement for for this project "just because"

It seems it provides a lot of boilerplate that this PR doesn't use? Also, to contrast this with a hypothetical Mocked trait above, it's not like it's hard to implement Mocked for primitive types. If we took that route, we'd likely just define a macro that would let us synthesize implementations for one-liners (kinda like how we do with the impl_stacks_message_codec_for_int! macro to auto-generate code to implement StacksMessageCodec for all the integer types we use).

With a bit more effort this could easily be implemented, very similar to their built-in "locales" support. This isn't a use-case for the moment with basic testing that instances can be serialized, inserted in the database, read from the database, deserialized and be equal.

With a bit more effort we can also just avoid fake altogether ;) More seriously, we're writing this code with or without fake, meaning fake isn't providing value here.


Circling back to your first (deleted) question, I want to explore the line of questioning from @moodmosaic more. A value-add that fake or something like it could bring is easy integration with an existing property-testing framework (or more generally, other widely-used automated testing systems like fuzzers, model checkers, and so on). An argument against my hypothetical Mocked trait would be that we'd have to do all the integration work ourselves, whereas a more standardized system like fake might do this for us. That would be a "fundamentally correct" way to do this IMO -- add the means of synthesizing data structures to the codebase such that we can trivially plug them into an existing automated testing system.

@cylewitruk
Copy link
Member Author

cylewitruk commented Mar 4, 2024

You deleted this, but I think it's a valid question:

I deleted it simply because I thought it could come across as unfriendly :)

If I were doing this, I would just create a trait called Mocked, similar to Default, which provided the function fn mocked() -> Self, which when called would instantiate the struct with random data from the domain of possible inputs. Then, you could have things like ClarityName::mocked() which evaluate to a ClarityName instance with plausible but synthesized data.

Just to ensure correct terminology and expectations, this PR introduces faking and not mocking.

There's nothing wrong with the approach you suggest - it was the first thing that came to mind and that which I started with -- but very quickly remembered that there are libraries to do most of this for me. So I'm personally of the opinion that it is a waste of cycles to re-implement, both on the development and maintenance fronts, when there are crates which give us that boilerplate. I would understand the scrutiny much more if this were a production crate, but this is a pretty popular crate which in probably 95% of cases can be used as a dev-dependency as intended; it's just unfortunate that cargo doesn't give us the option of propagating the current build profile to dependencies so that it would be a little bit cleaner instead of having to use the "feature"-workaround.

It seems it provides a lot of boilerplate that this PR doesn't use? Also, to contrast this with a hypothetical Mocked trait above, it's not like it's hard to implement Mocked for primitive types. If we took that route, we'd likely just define a macro that would let us synthesize implementations for one-liners (kinda like how we do with the impl_stacks_message_codec_for_int! macro to auto-generate code to implement StacksMessageCodec for all the integer types we use).

At the moment, sure, but this is just an introduction PR with minimalistic examples. My hope would be that it would be used much more broadly. As you say, it's not hard to implement, but do we really want to muddy-up the Stacks code-base with even more macros etc.? I'd much rather version-pin workspace dependencies and carefully upgrade than re-write code that already exists, or even worse.. principally copying it without those oss projects getting credit.

With a bit more effort we can also just avoid fake altogether ;) More seriously, we're writing this code with or without fake, meaning fake isn't providing value here.

If the idea was that it would be limited to the small number of examples provided here, then I would totally agree. But I think there are a lot of missing tests due to laziness of not wanting to manually fake such types, not at the least the example tests which I added in this PR. And on the other end of the spectrum, probably a lot of unnecessary initialization code which just adds to LoC and maybe could benefit from some randomization for fuzzing/prop-testing.

The real power isn't in the ability to implement custom generators for types, it's in the Dummy-derive which can be used for almost all types in the code-base now that these "more difficult" types have been provided -- including unit tests for a lot of the nested types which don't exist currently.

Circling back to your first (deleted) question, I want to explore the line of questioning from @moodmosaic more. A value-add that fake or something like it could bring is easy integration with an existing property-testing framework (or more generally, other widely-used automated testing systems like fuzzers, model checkers, and so on). An argument against my hypothetical Mocked trait would be that we'd have to do all the integration work ourselves, whereas a more standardized system like fake might do this for us. That would be a "fundamentally correct" way to do this IMO -- add the means of synthesizing data structures to the codebase such that we can trivially plug them into an existing automated testing system.

@moodmosaic and I discussed this briefly prior to this PR -- it would be great if we could get them to integrate, absolutely. It's a giant plus that they both use rand and that fake allows the Rng instance to be provided. I can't imagine that this would be difficult to achieve, and frankly would rather put effort on that than re-inventing wheels.

And last but not least, I used this crate heavily in my #4437 PR draft, and it both saved me a lot of time and increased my confidence in the tests.

@cylewitruk cylewitruk changed the title [Testing] Introduce fake faking library, and wrapper types for hashbrown's HashMap and HashSet [Testing] Introduce proptest property testing library, and wrapper types for hashbrown's HashMap and HashSet Mar 7, 2024
@cylewitruk cylewitruk force-pushed the testing/introduce-fake branch from fa78a9b to e7eb3d3 Compare March 7, 2024 08:19
@cylewitruk
Copy link
Member Author

This PR has been repurposed to introduce proptest instead of fake which achieves the same goal, but with the benefit of the type generators being usable for property testing 🚀

Note that most touched files are simply replacing hashbrown::HashMap/Set with the new wrapper types StacksHashMap/Set, which makes it much easier to swap implementations/hashers in the future and allows us to implement external traits on them.

@moodmosaic @jcnelson please re-review.

clarity/src/proptesting/types.rs Show resolved Hide resolved
clarity/src/proptesting/values.rs Show resolved Hide resolved
clarity/src/proptesting/values.rs Show resolved Hide resolved
@cylewitruk cylewitruk force-pushed the testing/introduce-fake branch from 9697ec5 to a53aeed Compare March 7, 2024 18:33
@cylewitruk cylewitruk changed the title [Testing] Introduce proptest property testing library, and wrapper types for hashbrown's HashMap and HashSet [Testing] Introduce proptest property testing library and add type alises for hashbrown types Mar 7, 2024
@cylewitruk
Copy link
Member Author

@jcnelson would love a re-review of this so we can clear your requested changes, when you have time.

@cylewitruk cylewitruk force-pushed the testing/introduce-fake branch from a53aeed to 65b85d2 Compare July 31, 2024 10:41
@cylewitruk cylewitruk marked this pull request as draft July 31, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants