Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]Support array_contains_seq functions like trino contains_sequence and ck hasSubstr function #33929

Merged
merged 19 commits into from
Nov 9, 2023

Conversation

leoyy0316
Copy link
Contributor

@leoyy0316 leoyy0316 commented Oct 30, 2023

Fixes #issue
ck:https://clickhouse.com/docs/en/sql-reference/functions/array-functions#hassubstr
trino:https://trino.io/docs/current/functions/array.html?highlight=contains_sequence#contains_sequence

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
@leoyy0316 leoyy0316 requested a review from a team as a code owner October 30, 2023 08:57
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
elements, element_offsets_ptr[i], element_offsets_ptr[i + 1], targets, target_offsets_ptr[0],
target_offsets_ptr[1], null_map_elements, null_map_targets);
} else {
if (ContainsSeq) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (ContainsSeq) {
if constexpr (ContainsSeq) {

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 31, 2023
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
LiShuMing
LiShuMing previously approved these changes Nov 1, 2023
Copy link
Contributor

@LiShuMing LiShuMing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -825,3 +825,19 @@ select array_intersect([row(1,2,3), row(3,4,5)], [row(3,4,5)]);
select array_contains_all([row(1,2,3), row(3,4,5)], [row(3,4,5)]);
select array_filter([row(1,2,3), row(3,4,5), row(4,5,6)], [0,1,0]);
select cardinality([row(1,2,3), row(3,4,5)]);

select array_contains_seq([1,2,3,4], [2,3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if SELECT array_contains_seq([1, 2, NULL, 3, 4], ['a'])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT array_contains_seq([1, 2, NULL, 3, 4], [2,3]);
SELECT array_contains_seq([1, 2, NULL, 3, 4], null);
SELECT array_contains_seq(null, [2,3])
SELECT array_contains_seq([1, 2, NULL, 3, 4], [null,null])
SELECT array_contains_seq([1, 2, NULL], [null,2])
SELECT array_contains_seq(null, null)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prefer keep the same behavior with trino

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if SELECT array_contains_seq([1, 2, NULL, 3, 4], ['a'])

MySQL [(none)]> select array_contains_all([1, 2, NULL, 3, 4], ['a']);
+-----------------------------------------------+
| array_contains_all([1, 2, NULL, 3, 4], ['a']) |
+-----------------------------------------------+
| 0 |
+-----------------------------------------------+
1 row in set (0.18 sec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT array_contains_seq([1, 2, NULL, 3, 4], [2,3]); SELECT array_contains_seq([1, 2, NULL, 3, 4], null); SELECT array_contains_seq(null, [2,3]) SELECT array_contains_seq([1, 2, NULL, 3, 4], [null,null]) SELECT array_contains_seq([1, 2, NULL], [null,2]) SELECT array_contains_seq(null, null)

all add to sqltest

Copy link
Contributor

@fzhedu fzhedu Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are all cases that just consider single array, please add some cases for array columns from a table, which would get some bugs. like following cases

CREATE TABLE array_test ( 
pk bigint not null ,
i_0   Array<BigInt>,
i_1   Array<BigInt>,
ai_0  Array<Array<BigInt>>,
ai_1  Array<Array<BigInt>>
) ENGINE=OLAP
DUPLICATE KEY(`pk`)
DISTRIBUTED BY HASH(`pk`) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false"
);

insert into array_test values
(1,[null,1],[null],[null],[[]]),
(2,[null],[1,3,4,5,1,2],[[null,1],null],[[1,null]]),
(3,[],[],[[],null,[1,1]],[],[1,1]),
(4,null,[],[[1,1]],[[1,1],null),
(5,[4,4,4],[4,null],[null],[null]),
(6,[1,1,2,1,1,2,3,3],[1,2,3],[[1]],[[1],[2],null,[null]]);

select array_contains_seq(i_0,i_1),array_contains_seq(i_0,i_0),array_contains_seq(i_1,i_0), array_contains_seq(ai_0, ai_1),array_contains_seq(ai_1, ai_1),array_contains_seq(ai_1, ai_0),array_contains_seq(ai_0, i_1),array_contains_seq(ai_0,null),,array_contains_seq(ai_0, [1,2]) from array_test;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this case to the Test file

Comment on lines 19 to 25
## Return value

Returns a value of the BOOLEAN type.

1 is returned if `arr2` is a subset of `arr1`. Otherwise, 0 is returned.

## Examples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append more special msg like in https://clickhouse.com/docs/en/sql-reference/functions/array-functions#hassubstr, such as null, no supertype etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prefer keeping the same behavior with trino

Comment on lines 768 to 788
bool found = false;
size_t i = target_start;
size_t j = element_start;
while (i < target_end && j < element_end) {
if constexpr (std::is_same_v<ArrayColumn, ElementColumn> || std::is_same_v<MapColumn, ElementColumn> ||
std::is_same_v<StructColumn, ElementColumn> || std::is_same_v<JsonColumn, ElementColumn>) {
found = (elements.equals(j, targets, i) == 1);
} else {
auto elements_ptr = (const ValueType*)(elements.raw_data());
auto targets_ptr = (const ValueType*)(targets.raw_data());
found = (elements_ptr[j] == targets_ptr[i]);
}
if (found) {
i++;
j++;
} else {
i = 0;
j++;
}
}
return i == target_end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems not process nullable cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems not process nullable cases?

done

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
LiShuMing
LiShuMing previously approved these changes Nov 3, 2023
@leoyy0316 leoyy0316 requested a review from fzhedu November 3, 2023 02:22
Seaven
Seaven previously approved these changes Nov 3, 2023
@Seaven Seaven enabled auto-merge (squash) November 3, 2023 02:27
@fzhedu fzhedu disabled auto-merge November 3, 2023 02:32
Comment on lines 768 to 770
if (element_end < target_end) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not true, as your tests case just consider single array cases. Taking an array column as example,
elem: [null][null] (its offsets are [0,0,0]
target: [1,2,3,4],[null] (its offsets are [0,4,4]

array_contains_seq(elem, target) should return false, true. but your answer sould be wrong to false, false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(element_end-element_start > target_end-target_start)

} else if (null_target || null_element) {
found = false;
} else {
found = (elements_ptr[j] == targets_ptr[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            if constexpr (std::is_same_v<ArrayColumn, ElementColumn> || std::is_same_v<MapColumn, ElementColumn> ||
                          std::is_same_v<StructColumn, ElementColumn> || std::is_same_v<JsonColumn, ElementColumn>) {
                found = (elements.equals(j, targets, i) == 1);
            } else

should move here

@fzhedu fzhedu requested review from Seaven and LiShuMing November 3, 2023 03:09
Copy link
Contributor

@fzhedu fzhedu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refer to be/test/exprs/array_functions_test.cpp to write some unit tests.
array_contains_empty_array
array_contains_all

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
@leoyy0316 leoyy0316 dismissed stale reviews from Seaven and LiShuMing via 3bdd409 November 3, 2023 11:01
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
Comment on lines 781 to 782
int k = j;
int l = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove k,l;
i = target_start;

if (l == target_end) {
return true;
}
i = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines 811 to 812
l++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else break;

// array_contains_seq(["a", "b", NULL], [NULL]) -> 1
// array_contains_seq(["a", "b", "c"], ["d"]) -> 0
// array_contains_seq(["a", "b", "c"], ["a", "d"]) -> 0
// array_contains_all(["a", "b", "c"], ["a", "c"]) -> 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

@@ -5467,4 +5467,55 @@ TEST_F(ArrayFunctionsTest, array_match_only_null) {
ASSERT_TRUE(dest_column->get(0).get_int8());
}
}
// NOLINTNEXTLINE
TEST_F(ArrayFunctionsTest, array_contains_seq) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the case nullable array1 and nullable array2, pls refer to array_contains_no_null to add some cases from nullable array1, not-nullable array2, not-nullable array1, nullable array2, not_nullable array1, not_nullable array2.

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
bool found = false;
size_t i = target_start;
size_t j = element_start;
while (j < element_end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

j <= element_end - (target_end - target_start), no need to loop if the left elements are less than target's elements

{
auto array = ColumnHelper::create_column(TYPE_ARRAY_ARRAY_VARCHAR, true);
array->append_datum(DatumArray{Datum(DatumArray{"a"}), Datum(DatumArray{"b"}), Datum()});
array->append_datum(DatumArray{"a", "d", "c"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be [[xxx]]

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
Comment on lines 818 to 820
if ((element_end - j) < (target_end - target_start)) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if put here, it should be <=; prefer put this condition to the while() at L:780

select array_contains_seq(null, [2,3]);
select array_contains_seq([1, 2, NULL, 3, 4], [null,null]);
select array_contains_seq([1, 2, NULL], [null,2]);
select array_contains_seq(null, null);
Copy link
Contributor

@fzhedu fzhedu Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add these tests to this file

CREATE TABLE array_test ( 
pk bigint not null ,
i_0   Array<BigInt>,
i_1   Array<BigInt>,
ai_0  Array<Array<BigInt>>,
ai_1  Array<Array<BigInt>>
) ENGINE=OLAP
DUPLICATE KEY(`pk`)
DISTRIBUTED BY HASH(`pk`) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false"
);

insert into array_test values
(1,[null,1],[null],[null],[[]]),
(2,[null],[1,3,4,5,1,2],[[null,1],null],[[1,null]]),
(3,[],[],[[],null,[1,1]],[],[1,1]),
(4,null,[],[[1,1]],[[1,1],null),
(5,[4,4,4],[4,null],[null],[null]),
(6,[1,1,2,1,1,2,3,3],[1,2,3],[[1]],[[1],[2],null,[null]]);

select array_contains_seq(i_0,i_1),array_contains_seq(i_0,i_0),array_contains_seq(i_1,i_0), array_contains_seq(ai_0, ai_1),array_contains_seq(ai_1, ai_1),array_contains_seq(ai_1, ai_0),array_contains_seq(ai_0, i_1),array_contains_seq(ai_0,null),,array_contains_seq(ai_0, [1,2]) from array_test;

… ck hasSubstr

Signed-off-by: leoyy0316 <571684903@qq.com>
Copy link

sonarqubecloud bot commented Nov 9, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.21) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link

github-actions bot commented Nov 9, 2023

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Nov 9, 2023

[BE Incremental Coverage Report]

pass : 48 / 54 (88.89%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/exprs/array_functions.cpp 48 54 88.89% [775, 849, 850, 851, 852, 1042]

@LiShuMing LiShuMing enabled auto-merge (squash) November 9, 2023 06:24
@LiShuMing LiShuMing merged commit 6320dbc into StarRocks:main Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants