Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case insensitivity mismatch between duckdb and postgres can cause issues #116

Open
JelteF opened this issue Aug 13, 2024 · 8 comments
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@JelteF
Copy link
Collaborator

JelteF commented Aug 13, 2024

DuckDB considers identifiers with different casing the same even if they are quoted.

The most minimal example of this is the following schema:

CREATE TABLE a(b int);
CREATE TABLE "A"(c int);
INSERT INTO a VALUES(1);

When using postgres execution this correctly outputs:

> SELECT * FROM a, "A" as aa;
 b │ c
───┼───
(0 rows)

But when enabling duckdb execution both a and "A" are interpreted as a. Resulting in the following incorrect result:

> SELECT * FROM a, "A" as aa;
 b │ b
───┼───
 1 │ 1
(1 row)

This happens even when I set preserver_identifier_case to true on the DuckDB connection.

@JelteF JelteF added the bug Something isn't working label Aug 13, 2024
@wuputah
Copy link
Collaborator

wuputah commented Aug 13, 2024

Good find. Some related discussion here: #56 (comment), and related to #43 -- though these are not specific to case insensitivity.

@Alzavio
Copy link

Alzavio commented Aug 25, 2024

This may be related, but in either of the two below, the field name cases aren't maintained in the new postgres table. Using column names in double quotes instead of a wildcard also does not resolve the issue

CREATE TABLE postgres_staging.public.{table_name} AS SELECT * FROM postgres_prod.public.{table_name}

duckdb.sql(f'CREATE TABLE {table_name} AS SELECT * FROM postgres_prod.public.{table_name}')
duckdb.sql(f'CREATE TABLE postgres_staging.public.{table_name} AS SELECT * FROM {table_name}')

@JelteF JelteF added this to the Long term milestone Sep 30, 2024
@JelteF JelteF removed the long-term label Sep 30, 2024
@pantonis
Copy link

This shouldn't be in the long term milestone. but an important bug to fix.

@Y-- Y-- self-assigned this Oct 30, 2024
@Y--
Copy link
Collaborator

Y-- commented Oct 30, 2024

@pantonis we'll try to address this asap.

@JelteF JelteF modified the milestones: Long term, 0.3.0 Nov 6, 2024
@JelteF
Copy link
Collaborator Author

JelteF commented Dec 18, 2024

Based on some research I did this should be quite easy to address, postgres_scanner is able to do this. I'm pretty sure it's a few lines of code only. So we can use the same approach as is used there. Let's wait until #477 is merged though, to avoid any merge conflicts.

@pantonis
Copy link

We had the same issue when we were using Hydra and it was fixed quite fast I can say.

@Leo-XM-Zeng
Copy link
Contributor

This seems to be solved in the current version.

We had the same issue when we were using Hydra and it was fixed quite fast I can say.

@JelteF
Copy link
Collaborator Author

JelteF commented Dec 19, 2024

If it's indeed resolved we should add some tests to make sure we don't regress again.

Leo-XM-Zeng added a commit to Leo-XM-Zeng/pg_duckdb that referenced this issue Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants