Nit fixes to URL-encoding of partition field names#1499
Nit fixes to URL-encoding of partition field names#1499kevinjqliu merged 6 commits intoapache:mainfrom
Conversation
| from pyiceberg.catalog import Catalog | ||
| from pyiceberg.partitioning import PartitionField, PartitionFieldValue, PartitionKey, PartitionSpec | ||
| from pyiceberg.schema import Schema | ||
| from pyiceberg.schema import Schema, make_compatible_name |
There was a problem hiding this comment.
The changes in this file address #1457 (comment)
pyiceberg/partitioning.py
Outdated
| @@ -237,8 +237,7 @@ def partition_to_path(self, data: Record, schema: Schema) -> str: | |||
| value_str = quote_plus(value_str, safe="") | |||
There was a problem hiding this comment.
we can collapse this too
| if make_compatible_name | ||
| else expected_partition_record | ||
| ) | ||
| sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()}) |
There was a problem hiding this comment.
we can either do this and run make_compatible_name for every test case
or we can set make_compatible_name as optional and only run for the specific use case
make_compatible_name: Optional[Callable[[str], str]] = None,
....
if make_compatible_name:
....
which one do you prefer?
There was a problem hiding this comment.
Sorry, not sure I understand. If it's always going to be pyiceberg.schema.make_compatible_name, doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case? I'm probably misunderstanding here again
There was a problem hiding this comment.
(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)
There was a problem hiding this comment.
(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)
oh i didn't know that! My comment was based on the fact that we can set a default None value.
doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?
yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name
but also, this is a nit comment, we dont necessary have to do this.
There was a problem hiding this comment.
Yeah not sure. I marginally prefer leaving it as it is now - it reads more nicely, prevents Nones/Falses mostly everywhere, sanitisation is fast so I don't think it'll causes a cumulative slowdown even when done for several test cases.
kevinjqliu
left a comment
There was a problem hiding this comment.
i think we'd need to run make lint, due to #1507
| if make_compatible_name | ||
| else expected_partition_record | ||
| ) | ||
| sanitized_record = Record(**{make_compatible_name(k): v for k, v in vars(expected_partition_record).items()}) |
There was a problem hiding this comment.
(Aside: it's not easy to have default value for a pytest.mark.parametrize arg I think which is why I specified None each time before - and probably why there was no e.g. spark_create_table_sql_for_justification: str = None before that too)
oh i didn't know that! My comment was based on the fact that we can set a default None value.
doesn't it make more sense to be enabled by a boolean argument that we only set to True for the special field test case?
yea thats also the point i want to make. make it clear that only certain test cases requires make_compatible_name
but also, this is a nit comment, we dont necessary have to do this.
| with table.update_schema() as update: | ||
| update.add_column(("details", "confirmed_by"), StringType(), "Name of the exchange") | ||
| ``` | ||
|
|
|
Thanks for following up on this @smaheshwar-pltr |
Follow-up to #1457 that addresses nits on that PR.