Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hex string to varchar conversion #2939

Open
wants to merge 7 commits into
base: BABEL_4_X_DEV
Choose a base branch
from

Conversation

rishabhtanwar29
Copy link
Contributor

@rishabhtanwar29 rishabhtanwar29 commented Sep 17, 2024

Description

Varbinary to varchar conversion results in error if the binary input contains non-printable
UTF8 characters or NULL character. PG natively supports such behavior of throwing error
for untranslatable characters but this is not true for T-SQL. T-SQL just keeps all unrecognized
characters bytes into the resultant varchar string instead of throwing error so that it is upon
client to handle them, most of the clients replace unrecognized characters with a?.

This commit fixes this issue by handling non-printable and NULL characters in a string instead
of throwing error. This is achieved by keeping the unrecognized bytes as it is in the output
string so that it is upon client to handle such characters. This is important to note that the PG's
internal logic to handle NULL (0x00) character remains same so any string operations on strings
containing NULL character might generate different result compared to T-SQL standard.

Task: BABEL-4562
Signed-off-by: Rishabh Tanwar ritanwar@amazon.com

Test Scenarios Covered

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
Signed-off-by: Rishabh Tanwar <ritanwar@amazon.com>
@rishabhtanwar29 rishabhtanwar29 requested review from rohit01010, Deepesh125 and tanscorpio7 and removed request for Deepesh125 September 19, 2024 05:24
if (*utf == '\0')
break;
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These routines are being used to send the data of string data type with different collation to end client, how are we making sure that this change will not regress anything existing?

@@ -271,6 +277,12 @@ TsqlUtfToLocal(const unsigned char *utf, int len,
}
iutf = (b1 << 24 | b2 << 16 | b3 << 8 | b4);

if (!pg_utf8_islegal(cur, l))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, how are we making sure that our test cases covers case which hits "break" previously?

* instead of plain strlen since strlen calculates string upto first null character (\0)
* even though the input string might contain more characters.
*/
if (pltsql_plugin_handler_ptr->is_tsql_varchar_or_char_datatype(col->pgTypeOid))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this check?

* ex: CAST(CAST('a' AS BINARY(10)) AS VARCHAR) should work
* and not fail because of null byte
*/
while(len>0 && data[len-1] == '\0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense if we dont remove trailing zeroes for binary, but we should remove them for varbinary, right? what is expected tsql behaviour?

@@ -437,7 +437,6 @@ BABEL-4672
permission_restrictions_from_pg
BABEL-730-before-15_6-or-16_1
GRANT_SCHEMA-before-15_7-16_3
babel-4475
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

@@ -198,6 +198,9 @@ DECLARE @inputString BINARY(10) = 0x61626364656667, @pattern BINARY(10) = 0x6364
SELECT replace(@inputString, @pattern, @replacement)
GO

UPDATE babel_4836_replace_t4 SET a = 0x6162636465, b = 0x6263, c = 0x747576;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants