Skip to content

PGO applicability to Vector #15631

Open
Open
@zamazan4ik

Description

@zamazan4ik
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

TL;DR: With PGO Vector got a boost from 300-310 k/s events to 350-370 k/s events!

Hi!

I am a big fan of PGO, so I've tried to use PGO with Vector. And I wanna share with you my current results. My hypothesis is the following: even for programs with LTO, PGO can bring HUGE benefits. And I decided to test it. From my experience, PGO especially works well with large codebases with some CPU-hot parts. Looks like Vector fits really well.

Test scenario

  1. Read a huge file with some logs
  2. Parse them
  3. Pass them to the blackhole.

This test scenario is completely real-life (except blackhole ofc :) ) and the log format with parse function are almost copied from our current prod env. We have patched flog tool to generate our log format (closed-source patch, sorry. I could publish it later if will be a need for it).

Example of one log entry:
<E 2296456 point.server.session 18.12 19:17:36:361298178 processCall We need to generate the solid state GB interface! (from session.cpp +713)

So Vector config is the following (toml):

[sources.in]
type = "file"
include = [ "/Users/zamazan4ik/open_source/test_vector_logs/data/*" ]
read_from = "beginning"
file_key = "file"
data_dir = "/Users/zamazan4ik/open_source/test_vector_logs"
 
[transforms.parser]
type = "remap"
inputs = [ "in" ]
source = """
.message = parse_regex!(.message, r'<(?P<level>[EWD]) (?P<thread>.+?) (?P<tag>[a-z.]+) (?P<datetime>[\\d.]+ [\\d:]*) (?P<function>[\\S]+) (?P<mess>.*) \\(from (?P<file>[\\S.]*) \\+(?P<line>\\d+)\\)')
"""
 
[sinks.out]
type = "blackhole"
inputs = [ "parser" ]

[api]
  enabled = true

You could say: "Test scenario is too simple", but:

  • I especially wanted to start with some minimal example to reduce noise from different unknown factors.
  • As I said before, for us it's a completely real-life example (just replace blackhole with smth like elasticsearch sink`)

Test setup

Macbook M1 Pro with macOS Ventura 13.1 with 6+2 CPU on ARM (AFAIK) + 16 Gib RAM + 512 Gib SSD. Sorry, I have no Linux machine near with me right now nor a desire to test it on Linux VM or Asahi Linux setup. However, I am completely sure that results will be reproducible on the "usual" Linux-based x86-64 setup.

How to build

Vector already uses fat LTO for the release build. However, local Release build and Release build on CI are different since local Release build does not use fat LTO (since it's tooooooooooooooooooooooo time consuming). So, do not forget to add the following flags to your Release build (got them from scripts/environment/release-flags.sh):

codegen-units = 1
lto = "fat"

For performing PGO build for Vector I've used this nice wrapper: https://github.com/Kobzol/cargo-pgo . You could do it manually if you want - I am just a little bit lazy :)

The guide is simple:

  • Install cargo pgo.
  • Run cargo pgo build. It will build the instrumented Vector version.
  • Run Vector with a test load like cargo pgo run -- -- -c /Users/zamazan4ik/open_source/test_vector_logs/vector.toml .
  • Wait for some time to finish. In my case, I generated nearly 2 Gib log file, so it completes the test plan in a minute, AFAIR.
  • Then just press ctrl+c to interrupt the Vector. The profile data will be generated somewhere in the target directory.
  • Run cargo pgo optimize. It will start the build again with the generated profile data.
  • Congratulations! After the successful build you will get a LTO + PGO release Vector version.

Is it worth it?

Yes! At least in my case, I have got a huge boost: from 300-310 k/s events (according to vector top) with default Vector release build with LTO flags from CI to 350-370 k/s with the same build + PGO enabled. So at least in my case - it's a huge boost.

The comparison strategy is simple: run just LTOed Vector binary, then LTOed + PGOed Vector binary (with resetting file checkpoint ofc). And measure the total time before the whole file will be processed + track metrics via vector top during the execution.

Results are stable and reproducible. I have performed multiple runs in different execution orders with the same results.

So what?

So what could we do with it?

  • At least consider adding PGO to CI. Yes, it has a LOT of caveats like a huge bump in a build time, good profile preparation, profile stability between releases, and much more other stuff but in my opinion, at least in some cases, it is definitely worth it.
  • For some users, who want "cheaply" try to boost their Vector performance. They could find this mini-guide and try to their log pipelines. Maybe will be a good idea to leave a note somewhere in the Vector documentation about this "advanced" option?

Possible future steps

Possible future steps for improving:

  • Perform more "mature" benchmarking, based on the current Vector benchmark infrastructure.
  • Try to play with BOLT. BOLT also could help with gaining more performance even from LTO + PGO build (but it's not guaranteed). This way has drawbacks like BOLT on some platforms is too unstable; BOLT could not support some architectures, etc. But it definitely a good tool to think about :)
  • Reduce somehow LTO time. I guess here could help more advanced linkers like lld or mold but I am not sure. AFAIK, mold has (or had, since this awesome linker evolves quickly) some caveats with LTO builds.

I hope the long read will be at least interesting for someone :) If you have any questions about it - just ask me here or on the official Vector Discord server (zamazan4ik nickname as well).

Metadata

Metadata

Assignees

No one assigned

    Labels

    domain: performanceAnything related to Vector's performancetype: enhancementA value-adding code change that enhances its existing functionality.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions