-
Notifications
You must be signed in to change notification settings - Fork 0
Performance Testing
pandas uses vbench to monitor performance across revisions.
vbench is a tool for benchmarking your code through time, for showing performance improvement or regressions.
WARNING: vbench
is not yet compatible with python3.
Also note that you need to have sqlite3 working with python.
A set of related benchmarks go together in a module (a .py
file).
See vb_suite/indexing.py
for an example.
There's typically some boilerplate common to all the tests, which can
be placed in a string common_setup
.
Now we can write our specific benchmark.
There are up to three items in a single benchmark:
- setup specific to that benchmark (typically a string concatenated to
common_setup
) - a statement to be executed, which is the first argument to the
vbench.BenchmarkRunner
class - instantiation the
vbench.Benchmark
class
It's important to separate the setup from the statement we're interested in profiling. The statement ought to be concise and should profile only one thing. If you mix setup in with the statement to be profiled, then changes affecting the performance of the setup (which might even take place outside your library) will pollute the test.
Each module must be listed in the suite.py
file in the modules list.
Not all tests can be run against the entire history of the project (since the API has changed).
For newer features, each Benchmark
object takes an optional start_date
parameter.
For example:
start_date=datetime(2012, 1, 1)
If a start_date
is not applied for a specific benchmark, the global setting from vb_suite.py
is used.
Another reason that a benchmark can't be run against the entire project's history is that API's sometimes have to change in ways that are not backwards compatible. For these cases, the easiest way to compare performance pre- to post-API change is probably the try-except idiom:
try:
rng = date_range('1/1/2000', periods=N, freq='min')
except NameError:
rng = DateRange('1/1/2000', periods=N, offset=datetools.Minute())
date_range = DateRange
Most contributors don't need to worry about writing a vbench or running the full suite against the project's entire history. If you are ever asked to run a vbench, change your directory to the root pandas directory and run
./test_perf.sh -b master -t HEAD
You can optionally restrict the run to certain files with the -r
paramater.