Skip to content

Conversation

@ViniciusSouzaRoque
Copy link
Contributor

@ViniciusSouzaRoque ViniciusSouzaRoque commented Feb 10, 2022

Returns the first occurrence of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas.

For example, find_in_set('ab', 'abc,b,ab,c,def')
returns 3

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@ViniciusSouzaRoque ViniciusSouzaRoque changed the title [C++][Gandiva] Implement Find_In_Set Function ARROW-15644: [C++][Gandiva] Implement Find_In_Set Function Feb 10, 2022
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from 01b0973 to b711271 Compare February 10, 2022 11:29
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch 3 times, most recently from 3d6fb68 to ed5c2c8 Compare February 10, 2022 13:45
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from 86cea33 to e70dca9 Compare April 18, 2022 09:34
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from c095449 to a7a4753 Compare April 26, 2022 09:52
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from a7a4753 to 66c25c7 Compare May 19, 2022 16:05
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from 66c25c7 to b7c3316 Compare June 7, 2022 10:49
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch 2 times, most recently from 86c145c to fcbd2ef Compare June 27, 2022 10:25
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch 3 times, most recently from b6970fe to afba45c Compare July 1, 2022 10:43
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch 4 times, most recently from e978d81 to 502072f Compare July 14, 2022 11:18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct..if there are empty strings then result can be positive no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@projjal
I followed the Hive results, should I return false validity?
I found one error in my return...
If find_len && list_len == 0 the return is 1.

@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch 2 times, most recently from 74ef20a to e4fc0bf Compare July 21, 2022 18:41
PHILO-HE added a commit to PHILO-HE/arrow that referenced this pull request Jul 22, 2022
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from e4fc0bf to d923d11 Compare July 22, 2022 11:33
zhouyuan pushed a commit to oap-project/arrow that referenced this pull request Jul 25, 2022
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/add-find-in-set-function branch from 095f1e0 to 92c8bd6 Compare July 25, 2022 10:08
@anthonylouisbsb
Copy link
Contributor

@kou @pitrou Can you check and merge that PR? It is approved by Projjal and me

@pitrou
Copy link
Member

pitrou commented Aug 3, 2022

Hmm... Arrow has a proper List type, so why take a comma-delimited string?

int32_t find_in_set_utf8_utf8(int64_t context, const char* to_find, int32_t to_find_len,
const char* string_list, int32_t string_list_len) {
// Return 0 if to search entry have commas
if (is_substr_utf8_utf8(to_find, to_find_len, reinterpret_cast<const char*>(","), 1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are looking for a single unicode codepoint below 128, you can probably do this faster using memchr.

cur_length = 0;
}
} else {
if (cur_length + 1 <= string_list_len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this condition? In which situation is it false?

}
} else {
if (cur_length + 1 <= string_list_len) {
if (!matching || (memcmp(string_list + i, to_find + cur_length, 1))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call memcmp if you are only comparing a single byte?

@anthonylouisbsb
Copy link
Contributor

anthonylouisbsb commented Aug 3, 2022

Hmm... Arrow has a proper List type, so why take a comma-delimited string?

@pitrou there are two things:

In the future, after adding the support for complex types we can add support for it using lists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants