Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to_timestamp behaves consistently with PostgreSQL #13351

Open
jayzhan211 opened this issue Nov 11, 2024 · 7 comments
Open

Ensure to_timestamp behaves consistently with PostgreSQL #13351

jayzhan211 opened this issue Nov 11, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

Is your feature request related to a problem or challenge?

In postgres, to_timestamp has signature to_timestamp ( text, text ) → timestamp with time zone and to_timestamp ( double precision ) → timestamp with time zone.

But we have query that has more than 2 arguments

# to_timestamp with formatting
query I
SELECT COUNT(*) FROM ts_data_nanos where ts > to_timestamp('2020-09-08T12:00:00+00:00', '2020-09-08 12/00/00+00:00', '%c', '%+', '%Y-%m-%d %H/%M/%s%#z')
----
2

Other than that the result seems different too.

Postgres

postgres=# select to_timestamp(1);
       to_timestamp        
---------------------------
 1970-01-01 07:30:01+07:30

Duckdb

D select to_timestamp(1);
┌───────────────────────────┐
│      to_timestamp(1)      │
│ timestamp with time zone  │
├───────────────────────────┤
│ 1970-01-01 07:30:01+07:30 │
└───────────────────────────┘

Datafusion

query P
select to_timestamp(1);
----
1970-01-01T00:00:01

Describe the solution you'd like

Change the function signature to be consistent with Postgres.
Change the result to be consistent with Postgres.

Describe alternatives you've considered

No response

Additional context

Duckdb doesn't support (text, text) signature

D select to_timestamp('05 Dec 2000', 'DD Mon YYYY');
Binder Error: No function matches the given name and argument types 'to_timestamp(STRING_LITERAL, STRING_LITERAL)'. You might need to add explicit type casts.
	Candidate functions:
	to_timestamp(DOUBLE) -> TIMESTAMP WITH TIME ZONE

LINE 1: select to_timestamp('05 Dec 2000', 'DD Mon YYY...

https://www.postgresql.org/docs/current/functions-formatting.html

@Omega359
Copy link
Contributor

The difference in to_timestamp(1) is almost certainly tz related as DF udf's don't have access to the context's timezone.

As for why the DF version has multiple arguments - it's because it's better. None of those other systems were designed with handling human entered data in mind. Dates and times especially are consistently different, especially if you go outside a particular locality (US - mm/dd/yyyy, most everywhere else dd/mm/yyyy as a very simple example).

If you ever have to clean 100's of billions of records and try to get consist dates and timestamps out of them having a function that accepts many possible formats to try is very refreshing. The alternative of when/when/when/when/when/............./otherwise statements is substantially poorer ux. For a single field I have up to 10 different formats I must try - and this is just for US specific data!

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Nov 12, 2024

Great, but what each argument represented for is unclear to me.

I can only find this query in test.
to_timestamp('2020-09-08T12:00:00+00:00', '2020-09-08 12/00/00+00:00', '%c', '%+', '%Y-%m-%d %H/%M/%s%#z')
Can I have %+ in 3rd argument and what does %c and %+ mean? Is there related document about this?

@Omega359
Copy link
Contributor

I'll point you to the documentation :)

The first arg is the string to parse, the remaining are chrono formats to test.

More examples in ... the examples

@jayzhan211
Copy link
Contributor Author

In the documentation the signature seems there can only be one expression?

to_timestamp(expression[, ..., format_n])

to_timestamp('2020-09-08T12:00:00+00:00', '2020-09-08 12/00/00+00:00', '%c', '%+', '%Y-%m-%d %H/%M/%s%#z')

The query has 2 expressions and 3 follow on chrono format

@Omega359
Copy link
Contributor

Ah, I see the confusion.

So that query was a test of to_timestamp with multiple chrono formats, one of which is malformed. So what you are actually seeing is 1 expression and 4 formats of which only the last 3 are valid chrono formats. I should likely have used a different string for the first chrono format ('this is an invalid chrono format' maybe).

@jayzhan211
Copy link
Contributor Author

Should we return error if we have invalid chrono format?

@Omega359
Copy link
Contributor

Should we return error if we have invalid chrono format?

That is a very good question. There must have been I reason I didn't do that when I coded this up but right now I can't recall why it would have been. Let me think about that for a day and see if I can recall a good reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants