-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hex string to varchar conversion #2939
base: BABEL_4_X_DEV
Are you sure you want to change the base?
Fix hex string to varchar conversion #2939
Conversation
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
if (*utf == '\0') | ||
break; | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These routines are being used to send the data of string data type with different collation to end client, how are we making sure that this change will not regress anything existing?
@@ -271,6 +277,12 @@ TsqlUtfToLocal(const unsigned char *utf, int len, | |||
} | |||
iutf = (b1 << 24 | b2 << 16 | b3 << 8 | b4); | |||
|
|||
if (!pg_utf8_islegal(cur, l)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, how are we making sure that our test cases covers case which hits "break" previously?
* instead of plain strlen since strlen calculates string upto first null character (\0) | ||
* even though the input string might contain more characters. | ||
*/ | ||
if (pltsql_plugin_handler_ptr->is_tsql_varchar_or_char_datatype(col->pgTypeOid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this check?
* ex: CAST(CAST('a' AS BINARY(10)) AS VARCHAR) should work | ||
* and not fail because of null byte | ||
*/ | ||
while(len>0 && data[len-1] == '\0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense if we dont remove trailing zeroes for binary, but we should remove them for varbinary, right? what is expected tsql behaviour?
@@ -437,7 +437,6 @@ BABEL-4672 | |||
permission_restrictions_from_pg | |||
BABEL-730-before-15_6-or-16_1 | |||
GRANT_SCHEMA-before-15_7-16_3 | |||
babel-4475 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
@@ -198,6 +198,9 @@ DECLARE @inputString BINARY(10) = 0x61626364656667, @pattern BINARY(10) = 0x6364 | |||
SELECT replace(@inputString, @pattern, @replacement) | |||
GO | |||
|
|||
UPDATE babel_4836_replace_t4 SET a = 0x6162636465, b = 0x6263, c = 0x747576; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
Description
Varbinary to varchar conversion results in error if the binary input contains non-printable
UTF8 characters or NULL character. PG natively supports such behavior of throwing error
for untranslatable characters but this is not true for T-SQL. T-SQL just keeps all unrecognized
characters bytes into the resultant varchar string instead of throwing error so that it is upon
client to handle them, most of the clients replace unrecognized characters with a
?
.This commit fixes this issue by handling non-printable and NULL characters in a string instead
of throwing error. This is achieved by keeping the unrecognized bytes as it is in the output
string so that it is upon client to handle such characters. This is important to note that the PG's
internal logic to handle NULL (0x00) character remains same so any string operations on strings
containing NULL character might generate different result compared to T-SQL standard.
Task: BABEL-4562
Signed-off-by: Rishabh Tanwar ritanwar@amazon.com
Test Scenarios Covered
Use case based -
Boundary conditions -
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.