Skip to content

Formalise Benchmarks #308

Open
Open
@pedromxavier

Description

@pedromxavier

Rationale

Create a formal benchmark pipeline to compare

  • Python
  • PythonCall (dev)
  • PythonCall (stable)
  • PyCall

Originally posted by @cjdoris in #300 (comment)

Requirements

  1. Match benchmark cases across suites
  2. Use the same Python executable across all interfaces
  3. Store multiple results or condensed statistics
  4. Track memory usage

Comments

Julia Side

Most benchmarking tools in Julia run atop BenchmarkTools.jl1 and using their interface to define test suites and store results is the way to go. Both PkgBenchmark.jl2 and AirspeedVelocity.jl3 provide functionality to compare multiple versions of a single package. Yet, they don't support comparison across multiple packages out-of-the-box. There will be some homework for us in building the right tools for this slightly generalized toolset.

Important to say that PkgBenchmark.jl has useful methods in its public API that we could leverage to build what we need. This includes methods for comparison between suites and for exporting those results to Markdown. AirspeedVelocity.jl is only made available through the CLI.

Python Side

In order to enjoy the same level of detail providede by BenchmarkTools.jl, we should adopt pyperf4.
There are many ways to use it, but a few experiments showed that the CLI + JSON interface is probably the desired option.

For each test case, stored in the PY_CODE variable, we would then create a temporary path JSON_PATH and run

run(`$(PY_EXE) -m pyperf timeit "$(PY_CODE)" --append="$(JSON_PATH)" --tracemalloc`)

After that, we should be able parse the output JSON and convert it into a PkgBenchmark.BenchmarkResults object. This makes it easier for integrating those results in the overall machinery, reducing the problem to setting the Python result as the reference value.

Tasks

  • Implement the reference Python benchmark cases
  • Implement the corresponding versions in the other suites
    • PythonCall (dev)
    • PythonCall (stable)
    • PyCall
  • Write a translator for pyperf JSON into BenchmarkResults
  • Write comparison tools
  • Write report generator
  • Setup GitHub actions

Resources

References

Footnotes

  1. BenchmarkTools.jl

  2. PkgBenchmark.jl

  3. AirspeedVelocity.jl

  4. pyperf

Metadata

Metadata

Assignees

No one assigned

    Labels

    priorityShould be fixed or implemented soon

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions