Significant performance boost: only write to the writer schema cache once. #724

fimmtiu · 2019-11-25T20:27:13Z

We noticed a performance problem with confluent-kafka-python while profiling our app: _get_decoder_func was being called a small number of times and took up a tiny amount of time, but _get_encoder_func seemed as if it was being called once per message and was taking up about 10% of our total runtime. This one-line patch gave us a 9% speed boost.

What's happening is that we have a cache for the encoder functions, but we're not actually using it in the encode_record_with_schema method:

confluent-kafka-python/confluent_kafka/avro/serializer/message_serializer.py

Line 113 in 50e09cf

self.id_to_writers[schema_id] = self._get_encoder_func(schema)

Every time that method is called, it will set a new entry in the id_to_writers cache, even if an identical one already exists. The fix is trivial: we should actually use the cache and not set new entries if there's already one in there.

ghost · 2019-11-25T20:27:15Z

It looks like @fimmtiu hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

fimmtiu · 2019-11-25T20:32:03Z

[clabot:check]

ghost · 2019-11-25T20:32:05Z

@confluentinc It looks like @fimmtiu just signed our Contributor License Agreement. 👍

Always at your service,

clabot

fimmtiu · 2019-12-10T16:08:30Z

@edenhill / @mhowlett : Any thoughts on this? It's a pretty trivial change with a nice benefit, and we've been using it in production for a few weeks now. (The CI failures seem unrelated to the actual code in the PR.)

rnpridgeon

Good find that is rather unfortunate.

Do you mind adding a quick test demonstrating the cache actually working. This will help to catch regressions in the future.

… cases

Add test for schema cache

fimmtiu · 2019-12-17T15:26:46Z

@rnpridgeon Done! Thanks for the suggestion.

AndreiLieb · 2019-12-23T18:40:06Z

It seems to me that this can be fixed simply by removing line 113. Line 113 is not needed as encode_record_with_schema delegates the work to encode_record_with_schema_id where this logic is already present.

mhowlett · 2021-03-11T22:10:10Z

well pointed out @AndreiLieb . since the code does the job as is, and it'll be depreciated soon, just going with what's there to avoid mucking about. merged via #1057

Only write to the writer schema cache once.

98fd822

edenhill requested a review from rnpridgeon December 10, 2019 16:21

rnpridgeon suggested changes Dec 12, 2019

View reviewed changes

fimmtiu and others added 3 commits December 12, 2019 16:07

Add test for the _get_encoder_func caching.

6150f32

Patch against self.ms, change iterators to list for reuse across test…

e611711

… cases

Merge pull request #1 from fimmtiu/add-test-for-schema-cache

95f2620

Add test for schema cache

edenhill added the component:serdes Avro, JSON, Protobof, Schema-registry label Mar 9, 2021

mhowlett mentioned this pull request Mar 11, 2021

Only write schema cache once #1057

Merged

mhowlett closed this Mar 11, 2021

fimmtiu deleted the only-write-schema-cache-once branch March 16, 2021 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant performance boost: only write to the writer schema cache once. #724

Significant performance boost: only write to the writer schema cache once. #724

Uh oh!

fimmtiu commented Nov 25, 2019 •

edited

Loading

Uh oh!

ghost commented Nov 25, 2019

Uh oh!

fimmtiu commented Nov 25, 2019

Uh oh!

ghost commented Nov 25, 2019

Uh oh!

fimmtiu commented Dec 10, 2019

Uh oh!

rnpridgeon left a comment

Uh oh!

fimmtiu commented Dec 17, 2019

Uh oh!

AndreiLieb commented Dec 23, 2019

Uh oh!

mhowlett commented Mar 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Significant performance boost: only write to the writer schema cache once. #724

Significant performance boost: only write to the writer schema cache once. #724

Uh oh!

Conversation

fimmtiu commented Nov 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 25, 2019

Uh oh!

fimmtiu commented Nov 25, 2019

Uh oh!

ghost commented Nov 25, 2019

Uh oh!

fimmtiu commented Dec 10, 2019

Uh oh!

rnpridgeon left a comment

Choose a reason for hiding this comment

Uh oh!

fimmtiu commented Dec 17, 2019

Uh oh!

AndreiLieb commented Dec 23, 2019

Uh oh!

mhowlett commented Mar 11, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fimmtiu commented Nov 25, 2019 •

edited

Loading