-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for zstd
compression
#2021
Conversation
zstd
compression
Also, in https://kafka-python.readthedocs.io/en/master/#compression can you mention support for A clean way to do it would be to make it a bullet pointed list...
|
aa6b827
to
7ca9851
Compare
Now that #2020 is merged, can you please rebase this on top of that? |
356bd3a
to
95b06ce
Compare
done, though it seems github diff algorithm or needs to refresh master the branch tip. I could reopen it, but that would mean losing the conversation history. Any thoughts ? https://github.com/dpkp/kafka-python/compare/f9e0264..29d812f |
The diff is appearing fine for me. Typically what you'd do in this case is |
Actually strike that, in this diff I'm suddenly seeing the Are you sure you pulled latest |
i rebased on top of the merged branch, due to both PR's being made from a forked version of the repo. That is to say, that repo's master is not in sync with this repo's master, for various reasons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions... I tried to only review the zstd
stuff
Sorry, I wasn't clear... you don't want to rebase on top of the merged PR branch, you want to rebase on top of We'll need to see the diff so we can see exactly what is getting merged. |
For future reference, that would imply updating the origin first, then doing the actual rebase, or just adding a new remote, correct? I'll try to open a new PR with an identical but differently named branch. It seems the diff looks the same ... I can't say i understand this behaviour. The rebase left the merged branch commits intact, then calculated new hashes for this pr's commits. The diff should not contain the already merged changes |
Can you just pull latest And then I'll cleanup the history when I squash merge. If this truly doesn't work, then it's a weird github bug, but I expect it should work since we often do that workflow at my day job with no issues. |
That seems to have worked |
IMO this is mostly ready to go, except for #2021 (comment)... fixing the root issue of that is outside the scope of this particular PR, but in the meantime I'd rather not add that workaround hack. It looks like #2024 takes a stab at fixing the root cause, let's see if we can land that and then land this... |
Our team is looking for exactly this. When can we merge to master. :D |
@gabriel-tincu Sorry, but could you rebase this over master now? Thanks =) |
…all but the zstd related stuff)
…broker running when building producer
Harden zstd decompression for missing frame size information (can happen when the sender is not under our control)
Update readme zstd decompress will transform memoryview object to bytes before decompressing, to keep the same behavior as other decompression strategies Decrease fallback max message size due to OOM concerns Rewrite the 1MB message limit Test producer constructor makes use of broker version inference again, also add logic for zstd decompression
3372b10
to
5002d67
Compare
@tvoinarovskyi Done. I noticed that tox fails specifically for python3.7, but with an error related to tox itself rather than anything related to this PR |
👍 I can't wait for my consumer to not crash on zstd encoded messages. |
Is there any update on this? Our team is looking forward to using this library with our zstd encoded topics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffwidman I reviewed this once more time, seems good to go.
Is this ready for merge? |
ping? |
I'm a little worried kafka-python is no longer maintained. For anyone else experiencing this issue; I've created a workaround. I created this workaround in the hopes that kafka-python would be supporting zstd in the near future. While not ideal, this solution pipes the data from the zstd topic into a new topic which has been configured with gzip compression. It requires command-line access on the broker and can be monitored in the Kafka Manager via group ID. Step 1: Create a new topic and be sure to set compression.type=gzip Step 2: Create / Reset a consumer group (to track the mirror) Step 3: Begin the mirror threads I hope this helps someone else. |
@Green-Angry-Bird you could just use the source branch HEAD of this PR in your requirements file (assuming that's how you handle dependencies) until this gets merged. Taking the current global situation into account i think some delay in response on an open source project is perfectly understandable |
Forgive my ignorance @gabriel-tincu but how would I do that? I've tried these but they all give different errors:
|
What error does the third option throw? Also, have you tried using it without the |
The error was
However, it seems the issue is I needed to prepend it with
^ works :) |
getting below error when trying to consume from a ZSTD compressed record from kafka. raise UnsupportedCodecError( error log: Traceback (most recent call last): Process finished with exit code 1 My Code for consuming a kafka topic: from kafka import KafkaConsumer
if name == "main": |
Did you install the zstandard package from pypi? |
The ZSTD relevant code, in a separate PR, @jeffwidman
This change is