Skip to content

Profile-Guided Optimization (PGO) benchmark results #1426

Closed as not planned
Closed as not planned
@zamazan4ik

Description

@zamazan4ik

Hi!

Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with tokenizers since the project cares about performance.

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). That's why I tried to optimize tokenizers with PGO too.

Test environment

I performed tests on my Linux-based machine.

Linux:

  • Fedora 39
  • Linux kernel 6.6.9
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.75
  • Tokenizers version: the latest for now from the main branch on commit f1c23b868006ee27acdd31796677f82fa10d6bd7
  • Disabled Turbo boost (for more stable results across runs)

Benchmarks

As a benchmark, I use built-in benchmarks with cargo bench -- --verbose command from the Makefile (if you want to reproduce my results - please check #1425 before). For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench -- --verbose. For the PGO training phase, I use the same benchmark with cargo pgo bench -- --verbose.

Results

I got the following results:

As you see, in general, the Tokenizers' performance can be improved with PGO. I think this information can be written somewhere into the documentation, so users will be aware of PGO effects on the Tokenizers' performance and can decide to apply PGO for their Tokenizers' builds.

I already see some PGO mentions in the CI scripts but it's not clear - are Tokenizers packages PGO-optimized or not. As far as I can understand from the build scripts - they are not (but I could be wrong - please correct me in this case).

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions