Formalise Benchmarks

## Rationale

Create a formal benchmark pipeline to compare
- Python
- PythonCall (dev)
- PythonCall (stable)
- PyCall

_Originally posted by @cjdoris in https://github.com/cjdoris/PythonCall.jl/issues/300#issuecomment-1547350528_

### Requirements

1. **Match benchmark cases across suites**
2. **Use the same Python executable across all interfaces**
3. **Store multiple results or condensed statistics**
4. **Track memory usage**

### Comments

#### Julia Side

Most benchmarking tools in Julia run atop **BenchmarkTools.jl[^BenchmarkTools.jl]** and using their interface to define test suites and store results is the way to go. Both **PkgBenchmark.jl[^PkgBenchmark.jl]** and **AirspeedVelocity.jl[^AirspeedVelocity.jl]** provide functionality to compare multiple versions of a *single* package. Yet, they don't support comparison across *multiple* packages out-of-the-box. There will be some homework for us in building the right tools for this slightly generalized toolset.

Important to say that **PkgBenchmark.jl** has useful methods in its public API that we could leverage to build what we need. This includes methods for comparison between suites and for exporting those results to Markdown. **AirspeedVelocity.jl** is only made available through the CLI.

#### Python Side

In order to enjoy the same level of detail providede by **BenchmarkTools.jl**, we should adopt **pyperf[^pyperf]**.
There are many ways to use it, but a few experiments showed that the CLI + JSON interface is probably the desired option.

For each test case, stored in the `PY_CODE` variable, we would then create a temporary path `JSON_PATH` and run

```julia
run(`$(PY_EXE) -m pyperf timeit "$(PY_CODE)" --append="$(JSON_PATH)" --tracemalloc`)
```

After that, we should be able parse the [output JSON](https://pyperf.readthedocs.io/en/latest/api.html#pyperf-json-format) and convert it into a [`PkgBenchmark.BenchmarkResults`](https://juliaci.github.io/PkgBenchmark.jl/dev/run_benchmarks/#PkgBenchmark.BenchmarkResults) object. This makes it easier for integrating those results in the overall machinery, reducing the problem to setting the `Python` result as the reference value.

## Tasks

- [ ] Implement the reference Python benchmark cases
- [ ] Implement the corresponding versions in the other suites
  - [ ] PythonCall (dev)
  - [ ] PythonCall (stable)
  - [ ] PyCall
- [ ] Write a translator for **pyperf JSON** into `BenchmarkResults`
- [ ] Write comparison tools
- [ ] Write report generator
- [ ] Setup GitHub actions

## Resources

- [BENCHMARKS.md](https://github.com/cjdoris/PythonCall.jl/blob/main/BENCHMARKS.md)

## References

[^BenchmarkTools.jl]:
    [BenchmarkTools.jl](https://juliaci.github.io/BenchmarkTools.jl/stable/)
    
[^pyperf]:
    [pyperf](https://pyperf.readthedocs.io/en/latest/)
    
[^AirspeedVelocity.jl]:
    [AirspeedVelocity.jl](https://github.com/MilesCranmer/AirspeedVelocity.jl)
    
[^PkgBenchmark.jl]:
    [PkgBenchmark.jl](https://github.com/JuliaCI/PkgBenchmark.jl/)
    
    
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Formalise Benchmarks #308

Rationale

Requirements

Comments

Julia Side

Python Side

Tasks

Resources

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Formalise Benchmarks #308

Description

Rationale

Requirements

Comments

Julia Side

Python Side

Tasks

Resources

References

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions