-
Notifications
You must be signed in to change notification settings - Fork 3
[BUG] 658: add strtok function #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I have some doubts about this part of the function signature: TypeSignature::Exact(vec![DataType::Utf8, DataType::Utf8, DataType::Int64]),
TypeSignature::Exact(vec![
DataType::LargeUtf8,
DataType::LargeUtf8,
DataType::Int64,
]),
TypeSignature::Exact(vec![
DataType::Utf8View,
DataType::Utf8View,
DataType::Int64,
]), When I use this instead: Coercible(vec![
Coercion::new_exact(TypeSignatureClass::Native(logical_string())),
Coercion::new_exact(TypeSignatureClass::Native(logical_string())),
Coercion::new_exact(TypeSignatureClass::Integer),
]), calling with one of the string args as
Seems df assigns I have seen a number of ways in which |
Regarding this, I have added a new enum type
All tests pass now and no panics. I have edited the PR description to reflect this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Thanks for your contribution. I apologize for the long waiting review time. We were in the process of function crate restructure. Now you can check the updated readme for embucket-functions https://github.com/Embucket/embucket/tree/main/crates/embucket-functions and Implementation guide https://github.com/Embucket/embucket/blob/main/crates/embucket-functions/docs/function_implementation_guide.md. Hope it can improve our processes and code quality in general.
Regarding this PR. Can you please add the function name to implemented_functions.csv and regenerate Unimplemented Functions list or delete this function from generated_snowflake_functions.rs
manually. So, this function can be called by our executors. More details about why we need it and how do it properly in in Implementation Guide
Great. Thank you |
@Vedin I have made the changes |
/// | ||
/// This is useful for writing UDFs that work across different string representations. | ||
#[derive(Debug, Clone)] | ||
pub(crate) enum Utf8LikeArray { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ravlio Can you please look into this? Is there any other ways to handle such cases? If I remember correctly you have already did some functions with string arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good approach to managing various specific string array types is to encapsulate them within a single enum that exposes common methods.
My implementation is more complex https://github.com/Embucket/embucket/blob/9b157f8623f2b68c01199adcd6e660c948dcb009/crates/embucket-functions/src/string-binary/split.rs
use std::sync::Arc; | ||
|
||
// strtok SQL function | ||
// Tokenizes a given string and returns the requested part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use Doc comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -3525,7 +3519,13 @@ pub const TABLE_FUNCTIONS: &[(&str, FunctionInfo)] = &[ | |||
) | |||
.with_docs("https://docs.snowflake.com/en/sql-reference/functions/rest_event_history") | |||
), | |||
("SPLIT_TO_TABLE", FunctionInfo::new( | |||
("RESULT_SCAN", FunctionInfo::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please delete this change with RESULT_SCAN
manually for now. It's a bug with registering table_functions and should be fixed soon. For now please exclude it by hand.
9b58500
to
d35720f
Compare
Which issue does this PR close?
Changes
StrtokFunc
and implemented logic according to snowflake docsUtf8LikeArray
to make it easy working with different string representationsHow was this tested?