-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Change default SQL mapping for VARCAHR
from Utf8
to Utf8View
#16142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Interesting, sqllogic testing successfully for my local mac but CI failed... |
6ee857a
to
0780263
Compare
Resolved all problems now including the slt testing. I think the PR is ready for benchmark progress to see if switching to Utf8View by default slows some queries down. cc @alamb Thanks a lot. |
🤖 |
THanks @zhuqi-lucas -- I have started it up |
🤖: Benchmark completed Details
|
Thank you @alamb , it seems no regression from the benchmark result! |
5d9c537
to
e54e432
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas -- this is great
FYI @jayzhan211 , @timsaucer @xudong963 and @comphead
I have a few suggestions on how to simplify the examples which would be nice to get in prior to merge but I don't think it is needed
// Assuming the input columns are | ||
// column[0]: customer_id / UTF8 | ||
// column[0]: customer_id / UTF8 or UTF8View |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the example would be simpler if it just used Utf8View
. Could we do that and avoid adding the logic to handle both types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Thank you @alamb , addressed in latest PR.
@@ -15,6 +15,10 @@ | |||
# specific language governing permissions and limitations | |||
# under the License. | |||
|
|||
# Currently, the avro not support Utf8View type, so we disable the map_varchar_to_utf8view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
parquet-testing
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be an update of parquet-testing with unrelated content, but I think it is ok to update it too
@@ -182,7 +182,11 @@ async fn write_out(ctx: &SessionContext) -> std::result::Result<(), DataFusionEr | |||
let mut df = ctx.sql("values ('a'), ('b'), ('c')").await.unwrap(); | |||
|
|||
// Ensure the column names and types match the target table | |||
df = df.with_column_renamed("column1", "tablecol1").unwrap(); | |||
df = df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if we could change the example to maybe just use a StringViewArray
in the first place (rather than have to cast it here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Thank you @alamb , addressed in latest PR.
VARCAHR
from Utf8
to Utf8View
Thank you @alamb for review, i also addressed the suggestions about example in latest PR. |
I'll plan to merge this tomorrow unless I hear anything different |
e7243d4
to
9195b6a
Compare
I'm reviewing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you @xudong963 for review! |
Let's go, thank you all |
Thanks again @zhuqi-lucas and @xudong963 I have also made a PR to add a mention of this change to the upgrade guide: |
Which issue does this PR close?
Rationale for this change
Setting Varchar default mapping to utf8view, and change testing result.
What changes are included in this PR?
Setting Varchar default mapping to utf8view, and change testing result.
Are these changes tested?
Yes, new testing/slt testing result generated.
Are there any user-facing changes?
No, the internal datatype will change to utf8view.