Skip to content

Significant performance boost: only write to the writer schema cache once. #724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

fimmtiu
Copy link
Contributor

@fimmtiu fimmtiu commented Nov 25, 2019

We noticed a performance problem with confluent-kafka-python while profiling our app: _get_decoder_func was being called a small number of times and took up a tiny amount of time, but _get_encoder_func seemed as if it was being called once per message and was taking up about 10% of our total runtime. This one-line patch gave us a 9% speed boost.

What's happening is that we have a cache for the encoder functions, but we're not actually using it in the encode_record_with_schema method:

self.id_to_writers[schema_id] = self._get_encoder_func(schema)
Every time that method is called, it will set a new entry in the id_to_writers cache, even if an identical one already exists. The fix is trivial: we should actually use the cache and not set new entries if there's already one in there.

@ghost
Copy link

ghost commented Nov 25, 2019

It looks like @fimmtiu hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@fimmtiu
Copy link
Contributor Author

fimmtiu commented Nov 25, 2019

[clabot:check]

@ghost
Copy link

ghost commented Nov 25, 2019

@confluentinc It looks like @fimmtiu just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@fimmtiu
Copy link
Contributor Author

fimmtiu commented Dec 10, 2019

@edenhill / @mhowlett : Any thoughts on this? It's a pretty trivial change with a nice benefit, and we've been using it in production for a few weeks now. (The CI failures seem unrelated to the actual code in the PR.)

@edenhill edenhill requested a review from rnpridgeon December 10, 2019 16:21
Copy link
Contributor

@rnpridgeon rnpridgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find that is rather unfortunate.

Do you mind adding a quick test demonstrating the cache actually working. This will help to catch regressions in the future.

@fimmtiu
Copy link
Contributor Author

fimmtiu commented Dec 17, 2019

@rnpridgeon Done! Thanks for the suggestion.

@AndreiLieb
Copy link

It seems to me that this can be fixed simply by removing line 113. Line 113 is not needed as encode_record_with_schema delegates the work to encode_record_with_schema_id where this logic is already present.

@edenhill edenhill added the serdes Avro, JSON, Protobof, Schema-registry label Mar 9, 2021
@mhowlett
Copy link
Contributor

well pointed out @AndreiLieb . since the code does the job as is, and it'll be depreciated soon, just going with what's there to avoid mucking about. merged via #1057

@mhowlett mhowlett closed this Mar 11, 2021
@fimmtiu fimmtiu deleted the only-write-schema-cache-once branch March 16, 2021 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serdes Avro, JSON, Protobof, Schema-registry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants