Open
Description
It would be convenient to have a canonical set of dataframes for use in testing and/or benchmarking. Ideally this would be a set of named dataframes that represented common forms of data like the following:
- Random floating point data
- Random integer data
- Strings with low entropy
- Strings with high entropy
- Mostly sorted datetimes
- ...
These could then be used either within Pandas or in other libraries for benchmarks. Having a consistent set of dataframes would probably aid consistent benchmarking.
Additionally if this was then separately arranged into pytest fixture we could imagine setting things up and tearing things down in a way that made benchmarking more consistent (such as controlling garbage collection), though this may be a separate endeavor. It would be nice to have access to the dataframes outside of the context of PyTest as well