Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade protobuf from v4 to v5: 5x Upsert throughput #393

Merged
merged 1 commit into from
Oct 7, 2024

Conversation

daverigby
Copy link
Contributor

@daverigby daverigby commented Sep 26, 2024

Upgrade the protobuf dependancy from v4 (4.25) to v5 (5.28). This
appears to have significantly faster protobuf encoding - I see a 4.5x - 5x
inprovement in Upsert throughput on a given EC2 machine (i3.xlarge)
for large batches (~300) of high dimensionality vectors (1536):

Before:

Performing Populate phase               1675770/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:35:34 44:17:17
  Records/sec: 785.2

After:

Performing Populate phase               1531830/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:07:07 10:35:48
  Records/sec: 3584.4

I haven't dug into the exact details, but the profile is quite
different - the python frames performing type checking are no longer
present, so I assume they have been optimised, perhaps pushed to
native code?

Before profile:

protobuf_v4

After profile:

protobuf_v5

Type of Change

  • None of the above: Dependency upgrade.

Test Plan

Describe specific steps for validating this change.

Upgrade the protobuf dependancy from v4 (4.25) to v5 (5.28). This
appears to have significantly faster protobuf encoding - I see a 5x
inprovement in Upsert throughput on a given EC2 machine (i3.xlarge)
for large batches (~300) of high dimensionality vectors (1536):

Before:

    Performing Populate phase               1675770/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:35:34 44:17:17
      Records/sec: 785.2

After:

    Performing Populate phase               1531830/138364198 ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   1% 0:07:07 10:35:48
      Records/sec: 3584.4

I haven't dug into the exact details, but the profile is quite
different - the python frames performing type checking are no longer
present, so I assume they have been optimised, perhaps pushed to
native code?
@daverigby daverigby requested review from jhamon and aulorbe September 26, 2024 20:03
@jhamon
Copy link
Collaborator

jhamon commented Oct 7, 2024

Looks good, thanks! Changing dependencies is a breaking change, but it should go out soon when we release new SDKs for the Oct 15 API version.

@jhamon jhamon merged commit 1d0f046 into main Oct 7, 2024
84 checks passed
@jhamon jhamon deleted the daver/protobuf_5 branch October 7, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants