Feat(duckdb): handle transpilation into DuckDB from ByteString type #6329

fivetran-felixhuang · 2025-11-14T22:01:06Z

At the moment, the string in a ByteString is tranpiled to a string with the escape syntax e'...'. However, DuckDB has limited support for e'...'

We need to handle the escape sequences in the ByteString input correctly, while also making sure the resulting DuckDB query produces the same result as the original BigQuery query

To handle escape sequences, we can use the ::blob operator, and to handle other possible utf-8 characters, we can use the encode() function in DuckDB. We have to use ::blob and encode() for different input segments.

For one, ::blob doesn't handle utf-8 characters after the first 256 ones (such as 数).

Also, while encode can handle escape sequences, it treats them as string literals instead of actual bytes, so the resulting query can produce different values. For example, MD5(b"Mixed\x00Texÿt") in BQ and base64(UNHEX(MD5(ENCODE('Mixed\x00Texÿt')))) in DuckDB produce different outputs

The strategy here is to handle escape sequences and other segments separately and differently, and concatenate them as the output

Examples of BQ to DuckDB

MD5(b"Mixed\x00\x00Texÿt") -> UNHEX(MD5(ENCODE('Mixed') || '\x00\x00'::BLOB || ENCODE('Texÿt')))
so "Mixed\x00\x00Texÿt" is broken into ['Mixed', '\x00\x00', 'Texÿt']

MD5(b'\x00ÿ\x00') -> UNHEX(MD5('\x00'::BLOB || ENCODE('ÿ') || '\x00'::BLOB))
MD5(b'ÿ数据') -> UNHEX(MD5(ENCODE('ÿ数据')))
MD5(B"Hello World") -> UNHEX(MD5(ENCODE('Hello World')))

georgesittas · 2025-11-15T11:00:21Z

@fivetran-felixhuang let's discuss this on Monday. This approach feels too complicated.

handle transpilation into DuckDB from ByteString type

9532440

fivetran-felixhuang requested a review from georgesittas November 14, 2025 22:01

fivetran-felixhuang self-assigned this Nov 14, 2025

fix format

dc7385b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat(duckdb): handle transpilation into DuckDB from ByteString type #6329

Feat(duckdb): handle transpilation into DuckDB from ByteString type #6329

fivetran-felixhuang commented Nov 14, 2025 •

edited

Loading

Uh oh!

georgesittas commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat(duckdb): handle transpilation into DuckDB from ByteString type #6329

Are you sure you want to change the base?

Feat(duckdb): handle transpilation into DuckDB from ByteString type #6329

Conversation

fivetran-felixhuang commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgesittas commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fivetran-felixhuang commented Nov 14, 2025 •

edited

Loading