Fix: Handle non-UTF-8 headers in Kafka/Confluent message parsing by Bazarovinc · Pull Request #2459 · ag2ai/faststream

Bazarovinc · 2025-08-15T13:53:12Z

This PR addresses the UnicodeDecodeError that occurred when processing Kafka messages containing headers with non-UTF-8 byte sequences

Fixes #2458, FIxes #2214

Type of change

Bug fix (a non-breaking change that resolves an issue)

Checklist

My code adheres to the style guidelines of this project (just lint shows no errors)
I have conducted a self-review of my own code
I have made the necessary changes to the documentation
My changes do not generate any new warnings
I have added tests to validate the effectiveness of my fix or the functionality of my new feature
Both new and existing unit tests pass successfully on my local environment by running just test-coverage
I have ensured that static analysis tests are passing by running just static-analysis
I have included code examples to illustrate the modifications

CLAassistant · 2025-08-15T13:53:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Nikita Veselenko seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

draincoder · 2025-08-15T15:09:23Z

@Bazarovinc Please look at this #2214 issue as well and add comprehensive tests.

Bazarovinc · 2025-08-16T08:59:28Z

@draincoder Here’s a preliminary implementation for handling Kafka message headers. Please review and let me know if this approach looks good or if any adjustments are needed.

…aracters to avoid errors

Sehat1137 · 2025-09-06T20:29:56Z

faststream/confluent/parser.py

+                headers.get("reply_to"),
+                headers.get("content-type"),
+                headers.get("correlation_id"),


what does headers.get return? Can you show example or write type?

A small note: comprehension iterates over values (bytes) rather than pairs (key, value), so the expression for header, value in (...) will raise a ValueError.
It is worth iterating over headers.items() or decoding values using a separate helper.

decoded_headers = { k: v.decode(errors="replace") for k, v in headers.items() if k in ("reply_to", "content-type", "correlation_id") and v }

Or, even better and clearer, since iteration here seems excessive to me and makes the code difficult to understand. See https://github.com/ag2ai/faststream/pull/2459/files/d0ee2c6911627934e78d958536e428ebb2cb1ec6#r2427545938

reply_to = self._decode_header(headers, "reply_to") content_type = self._decode_header(headers, "content-type") correlation_id = self._decode_header(headers, "correlation_id")

Sehat1137 · 2025-09-06T20:32:05Z

faststream/confluent/parser.py

-            reply_to=headers.get("reply_to", ""),
-            content_type=headers.get("content-type"),
+            reply_to=headers.get("reply_to").decode()
+            if "content-type" in headers


content-type -> reply_to

Sehat1137 · 2025-09-06T20:34:55Z

faststream/confluent/parser.py

+            reply_to=headers.get("reply_to").decode()
+            if "content-type" in headers
+            else None,
+            content_type=headers.get("content-type").decode()
+            if "content-type" in headers
+            else None,
            message_id=f"{first.offset()}-{last.offset()}-{first_timestamp}",
-            correlation_id=headers.get("correlation_id"),
+            correlation_id=headers.get("correlation_id").decode()
+            if "correlation_id" in headers
+            else None,


What do u this about this way for better readable:

reply_to=headers.get("reply_to", b"").decode() or None content_type=headers.get("content-type", b"").decode() or None correlation_id=headers.get("correlation_id", b"").decode() or None

I suggest doing it this way

@staticmethod def _decode_header(headers: dict[str, bytes | None], key: str) -> str | None: val = headers.get(key) return val.decode(errors="replace") if val else None

and then, where we need to get the value, we do this

reply_to=self._decode_header(headers, "reply_to") content_type=self._decode_header(headers, "content-type") correlation_id=self._decode_header(headers, "correlation_id")

This is already applicable in parse_message and parse_message_batch from what I've noticed.

fix: add replacing non-utf-8 bytes on decoding headers

5dab07b

Bazarovinc requested a review from Lancetnik as a code owner August 15, 2025 13:53

github-actions bot added Confluent Issues related to `faststream.confluent` module AioKafka Issues related to `faststream.kafka` module labels Aug 15, 2025

Bazarovinc changed the base branch from dev to main August 15, 2025 14:09

Lancetnik changed the base branch from main to dev August 15, 2025 18:26

Bazarovinc marked this pull request as draft August 16, 2025 08:30

draft: add support for binary Kafka headers (confluent only)

1d30220

Nikita Veselenko added 3 commits August 20, 2025 18:50

refactor: Bytes decoding has been changed to 'replace' undecodable ch…

0014459

…aracters to avoid errors

draft: remove headers decoding for confluent

fe0f8bd

feat: refactored the decoding of key headers

d0ee2c6

Sehat1137 requested review from Sehat1137 and draincoder August 26, 2025 10:21

Lancetnik changed the base branch from dev to main September 4, 2025 20:51

Sehat1137 requested changes Sep 6, 2025

View reviewed changes

Lancetnik assigned draincoder Nov 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Handle non-UTF-8 headers in Kafka/Confluent message parsing#2459

Fix: Handle non-UTF-8 headers in Kafka/Confluent message parsing#2459
Bazarovinc wants to merge 5 commits intoag2ai:mainfrom
Bazarovinc:fix/non-utf8-kafka-message-headers

Bazarovinc commented Aug 15, 2025 •

edited by Lancetnik

Loading

Uh oh!

CLAassistant commented Aug 15, 2025

Uh oh!

draincoder commented Aug 15, 2025

Uh oh!

Bazarovinc commented Aug 16, 2025

Uh oh!

Sehat1137 Sep 6, 2025

Uh oh!

ozeranskii Oct 13, 2025 •

edited

Loading

Uh oh!

Sehat1137 Sep 6, 2025

Uh oh!

Sehat1137 Sep 6, 2025

Uh oh!

ozeranskii Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Bazarovinc commented Aug 15, 2025 • edited by Lancetnik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

Checklist

Uh oh!

CLAassistant commented Aug 15, 2025

Uh oh!

draincoder commented Aug 15, 2025

Uh oh!

Bazarovinc commented Aug 16, 2025

Uh oh!

Sehat1137 Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

ozeranskii Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Sehat1137 Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

Sehat1137 Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

ozeranskii Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Bazarovinc commented Aug 15, 2025 •

edited by Lancetnik

Loading

ozeranskii Oct 13, 2025 •

edited

Loading