-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add string trimming and padding functions #248
Changes from 5 commits
f510b7d
3bfdb96
2b0e096
2ab528d
cf41bdf
bdcba3e
0b47bd5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,7 +27,9 @@ scalar_functions: | |
return: "BOOLEAN" | ||
- | ||
name: substring | ||
description: Extract a portion of a string from another string. | ||
description: >- | ||
Extract a substring of a specified length starting from position start. | ||
A start value of 1 refers to the first characters of the string. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
|
@@ -44,7 +46,7 @@ scalar_functions: | |
- value: i32 | ||
- value: i32 | ||
return: "string" | ||
- | ||
- | ||
name: starts_with | ||
description: Whether this string starts with another string. | ||
impls: | ||
|
@@ -222,7 +224,8 @@ scalar_functions: | |
name: "substring" | ||
description: The substring to count. | ||
return: i64 | ||
- name: replace | ||
- | ||
name: replace | ||
description: >- | ||
Replace all occurrences of the substring with the replacement string. | ||
impls: | ||
|
@@ -248,3 +251,177 @@ scalar_functions: | |
name: "replacement" | ||
description: The replacement string. | ||
return: "varchar<L1>" | ||
- | ||
name: ltrim | ||
description: >- | ||
Remove any occurrence of the characters from the left side of the string. | ||
If no characters are specified, spaces are removed. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: "The string to remove characters from." | ||
- value: "varchar<L2>" | ||
name: "characters" | ||
description: "The set of characters to remove." | ||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
name: "input" | ||
description: "The string to remove characters from." | ||
- value: "string" | ||
name: "characters" | ||
description: "The set of characters to remove." | ||
return: "string" | ||
- | ||
name: rtrim | ||
description: >- | ||
Remove any occurrence of the characters from the right side of the string. | ||
If no characters are specified, spaces are removed. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: "The string to remove characters from." | ||
- value: "varchar<L2>" | ||
name: "characters" | ||
description: "The set of characters to remove." | ||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
name: "input" | ||
description: "The string to remove characters from." | ||
- value: "string" | ||
name: "characters" | ||
description: "The set of characters to remove." | ||
return: "string" | ||
- | ||
name: trim | ||
description: >- | ||
Remove any occurrence of the characters from the left and right sides of | ||
the string. If no characters are specified, spaces are removed. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: "The string to remove characters from." | ||
- value: "varchar<L2>" | ||
name: "characters" | ||
description: "The set of characters to remove." | ||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
name: "input" | ||
description: "The string to remove characters from." | ||
- value: "string" | ||
name: "characters" | ||
description: "The set of characters to remove." | ||
return: "string" | ||
- | ||
name: lpad | ||
description: Left-pad the string with the characters until the specified length of the string has been reached. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: "The string to pad." | ||
- value: i32 | ||
name: "length" | ||
description: "The length of the output string." | ||
- value: "varchar<L2>" | ||
name: "characters" | ||
description: "The set of characters to use for padding." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the "set of characters" concept for trim functions ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I'll update the wording to 'string'. DuckDB only allows one character, but postgresql let's you use a string for the padding functions. I figured we would use whatever is more flexible.
I'll update the lpad and rpad descriptions to address these types of scenarios. Not too sure about center though. To be honest, I also wasn't too convinced on the center function. I couldn't really find many places that implement it. I imagine the expectation is that someone could easily use a combination of lpad and rpad. I wonder if we should omit here as well? cc: @ianmcook There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another thought is that maybe we have center only take single character padding. Multiple character strings used for padding can be just for lpad/rpad There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jvanstraten in the interest of getting a bunch of these other more common functions in, i've removed center for now. I'll create a separate task to look more into center and if/how other SQL variants deal with it. |
||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
name: "input" | ||
description: "The string to pad." | ||
- value: i32 | ||
name: "length" | ||
description: "The length of the output string." | ||
- value: "string" | ||
name: "characters" | ||
description: "The set of characters to use for padding." | ||
return: "string" | ||
- | ||
name: rpad | ||
description: Right-pad the string with the characters until the specified length of the string has been reached. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: "The string to pad." | ||
- value: i32 | ||
name: "length" | ||
description: "The length of the output string." | ||
- value: "varchar<L2>" | ||
name: "characters" | ||
description: "The set of characters to use for padding." | ||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
name: "input" | ||
description: "The string to pad." | ||
- value: i32 | ||
name: "length" | ||
description: "The length of the output string." | ||
- value: "string" | ||
name: "characters" | ||
description: "The set of characters to use for padding." | ||
return: "string" | ||
- | ||
name: center | ||
description: >- | ||
Pad the string with characters from each side until the specified length of the string has | ||
been reached. If padding is uneven, the default option is to have extra padding on the right. | ||
impls: | ||
- args: | ||
- options: [ RIGHT, LEFT ] | ||
required: false | ||
- value: "varchar<L1>" | ||
name: "input" | ||
description: "The string to pad." | ||
- value: i32 | ||
name: "length" | ||
description: "The length of the output string." | ||
- value: "varchar<L2>" | ||
name: "characters" | ||
description: "The set of characters to use for padding." | ||
return: "varchar<L1>" | ||
- args: | ||
- options: [ RIGHT, LEFT ] | ||
required: false | ||
- value: "string" | ||
name: "input" | ||
description: "The string to pad." | ||
- value: i32 | ||
name: "length" | ||
description: "The length of the output string." | ||
- value: "string" | ||
name: "characters" | ||
description: "The set of characters to use for padding." | ||
return: "string" | ||
- | ||
name: left | ||
description: Extract count characters starting from the left of the string. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
- value: i32 | ||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
- value: i32 | ||
return: "string" | ||
- | ||
name: right | ||
description: Extract count characters starting from the right of the string. | ||
impls: | ||
- args: | ||
- value: "varchar<L1>" | ||
- value: i32 | ||
return: "varchar<L1>" | ||
- args: | ||
- value: "string" | ||
- value: i32 | ||
return: "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these trimming functions where the user can specify characters to remove, is there anything to clarify whether the characters are interpreted as-is or as a regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption was as-is, since that seems to be how Postgresql and DuckDB do it. For regex, they use different functions. I was planning on looking into those functions in another PR.