-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]Support array_contains_seq functions like trino contains_sequence and ck hasSubstr function #33929
Conversation
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
be/src/exprs/array_functions.cpp
Outdated
elements, element_offsets_ptr[i], element_offsets_ptr[i + 1], targets, target_offsets_ptr[0], | ||
target_offsets_ptr[1], null_map_elements, null_map_targets); | ||
} else { | ||
if (ContainsSeq) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ContainsSeq) { | |
if constexpr (ContainsSeq) { |
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -825,3 +825,19 @@ select array_intersect([row(1,2,3), row(3,4,5)], [row(3,4,5)]); | |||
select array_contains_all([row(1,2,3), row(3,4,5)], [row(3,4,5)]); | |||
select array_filter([row(1,2,3), row(3,4,5), row(4,5,6)], [0,1,0]); | |||
select cardinality([row(1,2,3), row(3,4,5)]); | |||
|
|||
select array_contains_seq([1,2,3,4], [2,3]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if SELECT array_contains_seq([1, 2, NULL, 3, 4], ['a'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT array_contains_seq([1, 2, NULL, 3, 4], [2,3]);
SELECT array_contains_seq([1, 2, NULL, 3, 4], null);
SELECT array_contains_seq(null, [2,3])
SELECT array_contains_seq([1, 2, NULL, 3, 4], [null,null])
SELECT array_contains_seq([1, 2, NULL], [null,2])
SELECT array_contains_seq(null, null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prefer keep the same behavior with trino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if SELECT array_contains_seq([1, 2, NULL, 3, 4], ['a'])
MySQL [(none)]> select array_contains_all([1, 2, NULL, 3, 4], ['a']);
+-----------------------------------------------+
| array_contains_all([1, 2, NULL, 3, 4], ['a']) |
+-----------------------------------------------+
| 0 |
+-----------------------------------------------+
1 row in set (0.18 sec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT array_contains_seq([1, 2, NULL, 3, 4], [2,3]); SELECT array_contains_seq([1, 2, NULL, 3, 4], null); SELECT array_contains_seq(null, [2,3]) SELECT array_contains_seq([1, 2, NULL, 3, 4], [null,null]) SELECT array_contains_seq([1, 2, NULL], [null,2]) SELECT array_contains_seq(null, null)
all add to sqltest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here are all cases that just consider single array, please add some cases for array columns from a table, which would get some bugs. like following cases
CREATE TABLE array_test (
pk bigint not null ,
i_0 Array<BigInt>,
i_1 Array<BigInt>,
ai_0 Array<Array<BigInt>>,
ai_1 Array<Array<BigInt>>
) ENGINE=OLAP
DUPLICATE KEY(`pk`)
DISTRIBUTED BY HASH(`pk`) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false"
);
insert into array_test values
(1,[null,1],[null],[null],[[]]),
(2,[null],[1,3,4,5,1,2],[[null,1],null],[[1,null]]),
(3,[],[],[[],null,[1,1]],[],[1,1]),
(4,null,[],[[1,1]],[[1,1],null),
(5,[4,4,4],[4,null],[null],[null]),
(6,[1,1,2,1,1,2,3,3],[1,2,3],[[1]],[[1],[2],null,[null]]);
select array_contains_seq(i_0,i_1),array_contains_seq(i_0,i_0),array_contains_seq(i_1,i_0), array_contains_seq(ai_0, ai_1),array_contains_seq(ai_1, ai_1),array_contains_seq(ai_1, ai_0),array_contains_seq(ai_0, i_1),array_contains_seq(ai_0,null),,array_contains_seq(ai_0, [1,2]) from array_test;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this case to the Test file
## Return value | ||
|
||
Returns a value of the BOOLEAN type. | ||
|
||
1 is returned if `arr2` is a subset of `arr1`. Otherwise, 0 is returned. | ||
|
||
## Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append more special msg like in https://clickhouse.com/docs/en/sql-reference/functions/array-functions#hassubstr, such as null, no supertype etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prefer keeping the same behavior with trino
be/src/exprs/array_functions.cpp
Outdated
bool found = false; | ||
size_t i = target_start; | ||
size_t j = element_start; | ||
while (i < target_end && j < element_end) { | ||
if constexpr (std::is_same_v<ArrayColumn, ElementColumn> || std::is_same_v<MapColumn, ElementColumn> || | ||
std::is_same_v<StructColumn, ElementColumn> || std::is_same_v<JsonColumn, ElementColumn>) { | ||
found = (elements.equals(j, targets, i) == 1); | ||
} else { | ||
auto elements_ptr = (const ValueType*)(elements.raw_data()); | ||
auto targets_ptr = (const ValueType*)(targets.raw_data()); | ||
found = (elements_ptr[j] == targets_ptr[i]); | ||
} | ||
if (found) { | ||
i++; | ||
j++; | ||
} else { | ||
i = 0; | ||
j++; | ||
} | ||
} | ||
return i == target_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems not process nullable cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems not process nullable cases?
done
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
be/src/exprs/array_functions.cpp
Outdated
if (element_end < target_end) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not true, as your tests case just consider single array cases. Taking an array column as example,
elem: [null][null] (its offsets are [0,0,0]
target: [1,2,3,4],[null] (its offsets are [0,4,4]
array_contains_seq(elem, target) should return false, true. but your answer sould be wrong to false, false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(element_end-element_start > target_end-target_start)
be/src/exprs/array_functions.cpp
Outdated
} else if (null_target || null_element) { | ||
found = false; | ||
} else { | ||
found = (elements_ptr[j] == targets_ptr[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if constexpr (std::is_same_v<ArrayColumn, ElementColumn> || std::is_same_v<MapColumn, ElementColumn> ||
std::is_same_v<StructColumn, ElementColumn> || std::is_same_v<JsonColumn, ElementColumn>) {
found = (elements.equals(j, targets, i) == 1);
} else
should move here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refer to be/test/exprs/array_functions_test.cpp to write some unit tests.
array_contains_empty_array
array_contains_all
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
be/src/exprs/array_functions.cpp
Outdated
int k = j; | ||
int l = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove k,l;
i = target_start;
be/src/exprs/array_functions.cpp
Outdated
if (l == target_end) { | ||
return true; | ||
} | ||
i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
be/src/exprs/array_functions.cpp
Outdated
l++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else break;
// array_contains_seq(["a", "b", NULL], [NULL]) -> 1 | ||
// array_contains_seq(["a", "b", "c"], ["d"]) -> 0 | ||
// array_contains_seq(["a", "b", "c"], ["a", "d"]) -> 0 | ||
// array_contains_all(["a", "b", "c"], ["a", "c"]) -> 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
@@ -5467,4 +5467,55 @@ TEST_F(ArrayFunctionsTest, array_match_only_null) { | |||
ASSERT_TRUE(dest_column->get(0).get_int8()); | |||
} | |||
} | |||
// NOLINTNEXTLINE | |||
TEST_F(ArrayFunctionsTest, array_contains_seq) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the case nullable array1 and nullable array2, pls refer to array_contains_no_null to add some cases from nullable array1, not-nullable array2, not-nullable array1, nullable array2, not_nullable array1, not_nullable array2.
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
bool found = false; | ||
size_t i = target_start; | ||
size_t j = element_start; | ||
while (j < element_end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
j <= element_end - (target_end - target_start), no need to loop if the left elements are less than target's elements
{ | ||
auto array = ColumnHelper::create_column(TYPE_ARRAY_ARRAY_VARCHAR, true); | ||
array->append_datum(DatumArray{Datum(DatumArray{"a"}), Datum(DatumArray{"b"}), Datum()}); | ||
array->append_datum(DatumArray{"a", "d", "c"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be [[xxx]]
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
be/src/exprs/array_functions.cpp
Outdated
if ((element_end - j) < (target_end - target_start)) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if put here, it should be <=; prefer put this condition to the while() at L:780
select array_contains_seq(null, [2,3]); | ||
select array_contains_seq([1, 2, NULL, 3, 4], [null,null]); | ||
select array_contains_seq([1, 2, NULL], [null,2]); | ||
select array_contains_seq(null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add these tests to this file
CREATE TABLE array_test (
pk bigint not null ,
i_0 Array<BigInt>,
i_1 Array<BigInt>,
ai_0 Array<Array<BigInt>>,
ai_1 Array<Array<BigInt>>
) ENGINE=OLAP
DUPLICATE KEY(`pk`)
DISTRIBUTED BY HASH(`pk`) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false"
);
insert into array_test values
(1,[null,1],[null],[null],[[]]),
(2,[null],[1,3,4,5,1,2],[[null,1],null],[[1,null]]),
(3,[],[],[[],null,[1,1]],[],[1,1]),
(4,null,[],[[1,1]],[[1,1],null),
(5,[4,4,4],[4,null],[null],[null]),
(6,[1,1,2,1,1,2,3,3],[1,2,3],[[1]],[[1],[2],null,[null]]);
select array_contains_seq(i_0,i_1),array_contains_seq(i_0,i_0),array_contains_seq(i_1,i_0), array_contains_seq(ai_0, ai_1),array_contains_seq(ai_1, ai_1),array_contains_seq(ai_1, ai_0),array_contains_seq(ai_0, i_1),array_contains_seq(ai_0,null),,array_contains_seq(ai_0, [1,2]) from array_test;
… ck hasSubstr Signed-off-by: leoyy0316 <571684903@qq.com>
Kudos, SonarCloud Quality Gate passed!
|
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 48 / 54 (88.89%) file detail
|
Fixes #issue
ck:https://clickhouse.com/docs/en/sql-reference/functions/array-functions#hassubstr
trino:https://trino.io/docs/current/functions/array.html?highlight=contains_sequence#contains_sequence
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: