Skip to content

Performance Testing

Tom Augspurger edited this page Mar 16, 2014 · 4 revisions

pandas uses vbench to monitor performance across revisions.

vbench

vbench is a tool for benchmarking your code through time, for showing performance improvement or regressions.

WARNING: vbench is not yet compatible with python3.

New Dependencies

Also note that you need to have sqlite3 working with python.

Writing a good vbench

A set of related benchmarks go together in a module (a .py file). See vb_suite/indexing.py for an example.

There's typically some boilerplate common to all the tests, which can be placed in a string common_setup.

Now we can write our specific benchmark.

There are up to three items in a single benchmark:

  • setup specific to that benchmark (typically a string concatenated to common_setup)
  • a statement to be executed, which is the first argument to the vbench.BenchmarkRunner class
  • instantiation the vbench.Benchmark class

It's important to separate the setup from the statement we're interested in profiling. The statement ought to be concise and should profile only one thing. If you mix setup in with the statement to be profiled, then changes affecting the performance of the setup (which might even take place outside your library) will pollute the test.

Each module must be listed in the suite.py file in the modules list.

Not all tests can be run against the entire history of the project (since the API has changed). For newer features, each Benchmark object takes an optional start_date parameter. For example:

start_date=datetime(2012, 1, 1)

If a start_date is not applied for a specific benchmark, the global setting from vb_suite.py is used.

Another reason that a benchmark can't be run against the entire project's history is that API's sometimes have to change in ways that are not backwards compatible. For these cases, the easiest way to compare performance pre- to post-API change is probably the try-except idiom:

try:
    rng = date_range('1/1/2000', periods=N, freq='min')
except NameError:
    rng = DateRange('1/1/2000', periods=N, offset=datetools.Minute())
    date_range = DateRange

Pre-Pull Request

Most contributors don't need to worry about writing a vbench or running the full suite against the project's entire history. If you are ever asked to run a vbench, change your directory to the root pandas directory and run

./test_perf.sh -b master -t HEAD

You can optionally restrict the run to certain files with the -r paramater.

Clone this wiki locally