Skip to content

Conversation

@WinkerDu
Copy link
Contributor

@WinkerDu WinkerDu commented Apr 3, 2022

Which issue does this PR close?

Closes #2141 .

Rationale for this change

df now has not implemented StringConcat operator, like

❯ select 'aa' || 'b';
NotImplemented("Unsupported SQL binary operator StringConcat")

But Postgres SQL will come out results like

select 'aa' || 'b';
 ?column? 
----------
 aab
(1 row)

What changes are included in this PR?

Reuse df build-in concat string expression to do it.

Are there any user-facing changes?

No.

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 3, 2022

cc @Dandandan @alamb @xudong963 @yjshen
Please have a review, thank you.

@jackwener
Copy link
Member

Great job! ❤

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 3, 2022

@jackwener thank you for your comment.
I'm not sure what PG behaves like is reasonable, PG build-in expression concat can take NULL like

select concat('aa', NULL, 'b');
 concat 
--------
 aab
(1 row)

I prefer unify the outputs of || and concat, what other contributors think of?

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 4, 2022

@jackwener thank you for your comment. I'm not sure what PG behaves like is reasonable, PG build-in expression concat can take NULL like

select concat('aa', NULL, 'b');
 concat 
--------
 aab
(1 row)

I prefer unify the outputs of || and concat, what other contributors think of?

@jackwener @renato2099
I've check the behaviors of PG of SparkSQL, they all come out results with NULL

  • Spark 3.2
spark-sql> select 'a' || NULL || 'b';
NULL
  • PG
# select 'a' || NULL || 'b';
 ?column? 
----------
 
(1 row)

Seems here is a convention, I'll comply with it in later commit, thank you both.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WinkerDu -- this is a nice addition

@wesm
Copy link
Member

wesm commented Apr 4, 2022

@alamb do you know why this PR generated e-mails to the dev@ mailing list?

@alamb
Copy link
Contributor

alamb commented Apr 4, 2022

@wesm I am sorry but do not know why there were some emails from this PR to dev e. g. https://lists.apache.org/thread/tqv2x57xmzzrppl3zgox5rt2rhfyg8jz

I vaguely remember getting an email from apache infra about "remailer" but I didn't read the details and now I can't seem to find it :(

There appears to be at least one C++ github conversation that got forwarded too: https://lists.apache.org/thread/vr3yjrcrb01b8zh64dclfl4d2gcxfw0j

@wesm
Copy link
Member

wesm commented Apr 4, 2022

It seems to have been a temporary INFRA hiccup (Iceberg had some issues, too https://lists.apache.org/thread/8f7xdfttk4xhfo7o66lwkpc4xzvx2vjz). If it starts happening again, we should open an INFRA Jira ticket.

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 5, 2022

@alamb @jackwener @renato2099 PTAL, thank you.

  1. support concat string and values that can be cast to string, add some UT, like select 'aa' || 42.
  2. output NULL when encounter NULL in concat pipe.
  3. keep StringConcat operator as BinaryExpr to make column physical name correct.

@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 6, 2022

@alamb PTAL, thank you.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @WinkerDu

Comment on lines +501 to +510
let result = (0..ignore_null_array.len())
.into_iter()
.map(|index| {
if left.is_null(index) || right.is_null(index) {
None
} else {
Some(ignore_null_array.value(index))
}
})
.collect::<StringArray>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future optimization it might be worth using the take kernel https://docs.rs/arrow/11.1.0/arrow/compute/kernels/take/fn.take.html which is fairly well optimized / tries to avoid copying.

This is good for now though

@alamb alamb merged commit ddf29f1 into apache:master Apr 7, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 19, 2022
… 'b' " (apache#2142)

* implement stringconcat operator

* snake case fix

* support non-string concat & handle NULL

* value -> array

* string concat internal coercion

* get NULL in right index of vec

Co-authored-by: duripeng <duripeng@baidu.com>
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 19, 2022
… 'b' " (apache#2142)

* implement stringconcat operator

* snake case fix

* support non-string concat & handle NULL

* value -> array

* string concat internal coercion

* get NULL in right index of vec

Co-authored-by: duripeng <duripeng@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add 'StringConcat' operator to df

6 participants