Description
Is your feature request related to a problem? Please describe.
Currently tests of libcudf, pylibcudf, and cuDF Python are a large set of manually written tests. While we endeavor to achieve high coverage rates of the APIs, we inevitably miss data-dependent edge cases, particularly around things like empty data sets.
Describe the solution you'd like
We should consider using hypothesis
or another fuzz testing library to add more systematic verification of different inputs. I recommend doing this at the Python layer since there is better and simpler tooling available, and because pylibcudf testing can be treated as a superset of libcudf testing in this respect to ensure good coverage of the C++.
Describe alternatives you've considered
We could also implement fuzz testing in C++ directly using e.g. Google's fuzztest, but that will be a bit more cumbersome to do.
Metadata
Assignees
Labels
Type
Projects
Status
Todo
Activity