-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with partitioned dataset: Cannot infer common argument type for comparison operation Binary != Utf8 #7039
Comments
The schemas of these two datasets in Parquet format are different. $ parquet-schema hits.parquet | grep MobilePhoneModel
REQUIRED BYTE_ARRAY MobilePhoneModel (STRING);
$ parquet-schema hits_partitioned/hits_0.parquet | grep MobilePhoneModel
OPTIONAL BYTE_ARRAY MobilePhoneModel;
The single file dataset has specified the logical type as In the partitioned dataset, the type inference for the MobilePhoneModel is I think we may need to support type coercion from |
Thank you for the investigation @jonahgao -- think coercion for binary --> UTF8 would make sense. I will try and find some time to work on this if no one beats me to it. 🤔 Or maybe I will file a ticket and see if someone else wants to work on it. |
#7080 should make the query execution successful.
It would be better if we could manually specify the data type of the columns, which is not currently supported for Parquet. create external table hits_partitioned(
"UserID" bigint,
"MobilePhoneModel" text
)
STORED AS PARQUET
LOCATION 'hits_partitioned'; |
I agree -- I think it would be good to cast the column into For this particular query we could add an explicit cast like SELECT
"MobilePhoneModel"::varchar,
COUNT(DISTINCT "UserID") AS u
FROM hits_partitioned
WHERE "MobilePhoneModel" <> ''
GROUP BY "MobilePhoneModel"
ORDER BY u DESC LIMIT 10; |
This must be the best way 👍 . |
Describe the bug
When running the following query (from ClickBench) on the partitioned dataset (100 parquet files)
I get the following error:
To Reproduce
Get the data using
bench.sh
(after #7005 is merged)Expected behavior
The query works fine with the single file dataset. I expect the same error
+------------------------------+---------+
| hits_single.MobilePhoneModel | u |
+------------------------------+---------+
| iPad | 1090347 |
| iPhone | 45758 |
| A500 | 16046 |
| N8-00 | 5565 |
| iPho | 3300 |
| ONE TOUCH 6030A | 2759 |
| GT-P7300B | 1907 |
| 3110000 | 1871 |
| GT-I9500 | 1598 |
| eagle75 | 1492 |
+------------------------------+---------+
Additional context
I found this while working on some benchmark results for #6988
The text was updated successfully, but these errors were encountered: