Produce representative dataframes for benchmarking

It would be convenient to have a canonical set of dataframes for use in testing and/or benchmarking.  Ideally this would be a set of named dataframes that represented common forms of data like the following:

1.  Random floating point data
2. Random integer data
3.  Strings with low entropy
4.  Strings with high entropy
5.  Mostly sorted datetimes
6. ...

These could then be used either within Pandas or in other libraries for benchmarks.  Having a consistent set of dataframes would probably aid consistent benchmarking.

Additionally if this was then separately arranged into pytest fixture we could imagine setting things up and tearing things down in a way that made benchmarking more consistent (such as controlling garbage collection), though this may be a separate endeavor.  It would be nice to have access to the dataframes outside of the context of PyTest as well

cc @jreback @wesm @cpcloud 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Produce representative dataframes for benchmarking #15911

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Produce representative dataframes for benchmarking #15911

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions