You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At this moment, there is no automated testing procedure across different versions of Cosmos SDK. Despite the best efforts in rigorous code reviews, critical defects could occasionally sneak in two ways:
introducing non-determinism in migration logic across state machine breaking major versions;
introducing state machine breaking changes across minor patches deemed to be non-breaking.
In this thread, I would like to brainstorm some ideas for the latter.
Extending simulation scripts
The existing simulations operate within a single version, but it should be feasible to extend them for minor cross-version testing.
I imagine it could be scripted in this way:
Run the simulations on the latest head of the release branch or PR, and export the state at the end.
Run the simulations with the same seed on the last released version, and export the state at the end.
Compare the exported states.
This could likely be done with the existing codebase without much difficulty. There are potentially two areas to consider improving:
Diagnosing the breaking changes: for example, simulations could be extended with “record and replay” functionality (e.g. recording and checking expected app hashes), so that one could see when the simulations diverged across two versions.
Improving the simulation speed: given this could double the pipeline execution time, it is worth exploring if two simulations with different versions could be run in parallel.
Targeted randomized testing
The simulations could be a good and easy start, but they may not cover some possible edge cases (given some generators may not fully cover some of the module’s functionality).
One possible alternative would be to have one more testing infrastructure where one can take code from both versions (most likely the consensus-critical parts, such as keepers), feed functions from each version the same input and assert that their output or resulting state is the same. The input could be randomly generated either in a black-box (such as with quick) or in a gray-box (such as with coverage-guided Go 1.18's fuzz testing) manner. There are, however, a few technical challenges:
It is not possible to import two versions of the same module in Go: one possible (albeit hacky) workaround would be if the testing infrastructure scripts “vendored” the release version’s code and renamed the modules. The code for these tests would also need to live separately or under a build tag, because its compilation would depend on the testing infrastructure’s scripts.
Visibility: hopefully, this targeted testing could work with only public definitions.
Valid input type generation: for black-box testing, this could be done by implementing a structure generator, but this may also need to be duplicated for the identical types in the other version. For gray-box testing, the most straightforward way may be to generate []byte and attempt to decode that into valid types via protobuf.
Execution: especially for gray-box testing, it would be worth running this continuously, and collect and reuse previously generated “interesting” samples (so that e.g. the execution can get past protobuf deserialization quickly). It is a question whether this custom testing process can be easily hooked up in existing infrastructures, such as OSS-Fuzz.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
At this moment, there is no automated testing procedure across different versions of Cosmos SDK. Despite the best efforts in rigorous code reviews, critical defects could occasionally sneak in two ways:
introducing non-determinism in migration logic across state machine breaking major versions;
introducing state machine breaking changes across minor patches deemed to be non-breaking.
In this thread, I would like to brainstorm some ideas for the latter.
Extending simulation scripts
The existing simulations operate within a single version, but it should be feasible to extend them for minor cross-version testing.
I imagine it could be scripted in this way:
Run the simulations on the latest head of the release branch or PR, and export the state at the end.
Run the simulations with the same seed on the last released version, and export the state at the end.
Compare the exported states.
This could likely be done with the existing codebase without much difficulty. There are potentially two areas to consider improving:
Diagnosing the breaking changes: for example, simulations could be extended with “record and replay” functionality (e.g. recording and checking expected app hashes), so that one could see when the simulations diverged across two versions.
Improving the simulation speed: given this could double the pipeline execution time, it is worth exploring if two simulations with different versions could be run in parallel.
Targeted randomized testing
The simulations could be a good and easy start, but they may not cover some possible edge cases (given some generators may not fully cover some of the module’s functionality).
One possible alternative would be to have one more testing infrastructure where one can take code from both versions (most likely the consensus-critical parts, such as keepers), feed functions from each version the same input and assert that their output or resulting state is the same. The input could be randomly generated either in a black-box (such as with
quick) or in a gray-box (such as with coverage-guided Go 1.18's fuzz testing) manner. There are, however, a few technical challenges:
It is not possible to import two versions of the same module in Go: one possible (albeit hacky) workaround would be if the testing infrastructure scripts “vendored” the release version’s code and renamed the modules. The code for these tests would also need to live separately or under a build tag, because its compilation would depend on the testing infrastructure’s scripts.
Visibility: hopefully, this targeted testing could work with only public definitions.
Valid input type generation: for black-box testing, this could be done by implementing a structure generator, but this may also need to be duplicated for the identical types in the other version. For gray-box testing, the most straightforward way may be to generate
[]byte
and attempt to decode that into valid types via protobuf.Execution: especially for gray-box testing, it would be worth running this continuously, and collect and reuse previously generated “interesting” samples (so that e.g. the execution can get past protobuf deserialization quickly). It is a question whether this custom testing process can be easily hooked up in existing infrastructures, such as OSS-Fuzz.
Beta Was this translation helpful? Give feedback.
All reactions