Support quoted column identifiers for scan row_filter string argument#1863
Merged
Fokko merged 1 commit intoapache:mainfrom Apr 4, 2025
Merged
Support quoted column identifiers for scan row_filter string argument#1863Fokko merged 1 commit intoapache:mainfrom
row_filter string argument#1863Fokko merged 1 commit intoapache:mainfrom
Conversation
Fokko
approved these changes
Apr 4, 2025
Contributor
Fokko
left a comment
There was a problem hiding this comment.
Thanks @norton120 for adding this. This PR makes sense to me. Sorry for the late reply, I wanted to double check if double quotes are the standard, and it looks like that's the case.
gabeiglio
pushed a commit
to Netflix/iceberg-python
that referenced
this pull request
Aug 13, 2025
# Rationale for this change
Our data lake uses old-school Kimball style quoted column names ("User
ID", "Customer Name" etc). The string parser for `row_filter` was unable
to parse this. Now it is.
example:
```python
# before
>> parser.parse(' "User Name" = 'ted')
ParseException: Expected '"', found ' '
# after
>> parser.parse(' "User Name" = 'ted')
EqualTo("User Name", "ted")
# Are these changes tested?
Yes a new test was added.
```
>[!NOTE]
> The `quoted_column_with_dots` previously errored `with "Expected '"',
found '.'"` _when using **double quotes only**_. It now raises error
text expecting an `'or'` value; I didn't toil over finding where the
exception is clobbered, because the error message between single and
double quote exceptions is inconsistent and I didn't really consider
this a polished/first-class error message. If this change is an issue, I
can dig further to try and revert the wording change; IMO raising the
same exception type is more than reasonable to consider the change
non-breaking.
# Are there any user-facing changes?
Yes quoted identifiers are now supported
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Our data lake uses old-school Kimball style quoted column names ("User ID", "Customer Name" etc). The string parser for
row_filterwas unable to parse this. Now it is.example:
Note
The
quoted_column_with_dotspreviously erroredwith "Expected '"', found '.'"when using double quotes only. It now raises error text expecting an'or'value; I didn't toil over finding where the exception is clobbered, because the error message between single and double quote exceptions is inconsistent and I didn't really consider this a polished/first-class error message. If this change is an issue, I can dig further to try and revert the wording change; IMO raising the same exception type is more than reasonable to consider the change non-breaking.Are there any user-facing changes?
Yes quoted identifiers are now supported