-
Notifications
You must be signed in to change notification settings - Fork 1.7k
to_timestamp_millis(), to_timestamp_micros(), to_timestamp_seconds()
#567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/cc @alamb |
|
BTW I plan to review this later today |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. Thank you @velvia
One thing I noticed while testing was that there wasn't a convenient way to cast a timestamp (nanoseconds) -- one had to go through in64.
So for example I can't apply to_timestamp_millis to the output of to_timestamp():
> select to_timestamp('2020-09-08T12:00:00+00:00')
> ;
+------------------------------------------+
| TimestampNanosecond(1599566400000000000) |
+------------------------------------------+
| 2020-09-08 12:00:00 |
+------------------------------------------+
1 row in set. Query took 0.002 seconds.
> select to_timestamp_millis(to_timestamp('2020-09-08T12:00:00+00:00'));
Internal("Unsupported data type Ok(Timestamp(Nanosecond, None)) for function to_timestamp_millis")
However, there is a (very ugly) workaround using cast:
select to_timestamp_millis((cast(to_timestamp('2020-09-08T12:00:00+00:00') as bigint)) / 1000000);
+---------------------------------------------------------------------+
| totimestampmillis(Int64(1599566400000000000) Divide Int64(1000000)) |
+---------------------------------------------------------------------+
| 2020-09-08 12:00:00 |
+---------------------------------------------------------------------+
| } | ||
|
|
||
| /// Internal cast function for casting ColumnarValue -> ColumnarValue for cast_type | ||
| pub fn cast_array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this would be more clearly named cast_column or cast_columnar_value (as it might be casting an array or a Scalar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| BuiltinScalarFunction::ToTimestamp => Signature::Uniform(1, vec![DataType::Utf8]), | ||
| BuiltinScalarFunction::ToTimestampMillis => Signature::Uniform( | ||
| 1, | ||
| vec![DataType::Utf8, DataType::LargeUtf8, DataType::Int64], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These signatures claim to support arguments of typeLargeUtf8 but LargeUtf8 isn't handled in the implementations above. I think we should make them consistent (I don't have a preference about actually supporting LargeUtf, btw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I actually don't understand what LargeUTF is about, but since ToTimestamp only supports regular Utf8, is it OK if all of the timestamp functions just support regular Utf8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb I actually don't understand what LargeUTF is about,
Utf8uses i32 to store offsets and thus can store strings up to 2GB. LargeUtf8 stores string offsets using u64s and thus can support strings that are larger than 2GB. I do not know how widely LargeUtf8 is used in practice.
is it OK if all of the timestamp functions just support regular Utf8?
yes I think so
| 1599568949190855000, // 2020-09-08T12:42:29.190855+00:00 | ||
| 1599565349190855000, | ||
| ]; // 2020-09-08T11:42:29.190855+00:00 | ||
| let divisor = match A::get_time_unit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
@alamb thanks for the review.
I think I'll go ahead and add support for the use case of Also, I had asked about docs - I think the |
There is a list of supported functions here: Up to now, we have mostly been following the postgres functions and thus didn't need to document functions within DataFusion. I suggest:
|
|
I'm getting a very odd local compile error after merging with master latest: |
@velvia I think you need to do We should probably update the version requirements in datafusion to reflect that it requires arrow 4.3 or later (not 4.0 or later) |
|
@alamb thanks, I can update cargo.toml for datafusion, that did the trick |
Codecov Report
@@ Coverage Diff @@
## master #567 +/- ##
==========================================
+ Coverage 76.02% 76.23% +0.20%
==========================================
Files 156 156
Lines 27063 27280 +217
==========================================
+ Hits 20575 20796 +221
+ Misses 6488 6484 -4
Continue to review full report at Codecov.
|
|
@alamb I believe all changes and feedback have been addressed. I added the capability of to_timestamp_xx() to convert from other timestamp types for convenience, and also added documentation. :) |
| | `BOOLEAN` | `Boolean` | | ||
| | `DATE` | `Date32` | | ||
| | `TIME` | `Time64(TimeUnit::Millisecond)` | | ||
| | `TIMESTAMP` | `Timestamp(TimeUnit::Nanosecond)` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
|
||
| ## `to_timestamp` | ||
|
|
||
| `to_timestamp()` is similar to the standard SQL function. It performs conversions to type `Timestamp(Nanoseconds, None)`, from: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great docs
|
Thanks @velvia -- this looks great! |
Which issue does this PR close?
Closes #355 .
Rationale for this change
This PR implements the timestamp functions above in #355, basically extending the following capabilities for all Timestamp types (specifically for Millis, micros, and seconds) via the above named functions:
What changes are included in this PR?
Pretty straightforward additions to
functions.rs,datetime_expressionsetc.What is NOT included so far but could be depending on discussion:
Are there any user-facing changes?
Three new user-facing SQL functions.