-
Notifications
You must be signed in to change notification settings - Fork 915
Significant performance boost: only write to the writer schema cache once. #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It looks like @fimmtiu hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement here. Once you've signed reply with Appreciation of efforts, clabot |
[clabot:check] |
@confluentinc It looks like @fimmtiu just signed our Contributor License Agreement. 👍 Always at your service, clabot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find that is rather unfortunate.
Do you mind adding a quick test demonstrating the cache actually working. This will help to catch regressions in the future.
@rnpridgeon Done! Thanks for the suggestion. |
It seems to me that this can be fixed simply by removing line 113. Line 113 is not needed as |
well pointed out @AndreiLieb . since the code does the job as is, and it'll be depreciated soon, just going with what's there to avoid mucking about. merged via #1057 |
We noticed a performance problem with confluent-kafka-python while profiling our app:
_get_decoder_func
was being called a small number of times and took up a tiny amount of time, but_get_encoder_func
seemed as if it was being called once per message and was taking up about 10% of our total runtime. This one-line patch gave us a 9% speed boost.What's happening is that we have a cache for the encoder functions, but we're not actually using it in the
encode_record_with_schema
method:confluent-kafka-python/confluent_kafka/avro/serializer/message_serializer.py
Line 113 in 50e09cf
id_to_writers
cache, even if an identical one already exists. The fix is trivial: we should actually use the cache and not set new entries if there's already one in there.