-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Update time grain expressions for Spark >= 3.x #18690
fix: Update time grain expressions for Spark >= 3.x #18690
Conversation
Spark removed date format string 'u' in Spark 3.0. Switch to using date_trunc which has been around since 2.3
cc @betodealmeida (who seems to have done most of the work on the databricks/spark engine spec) If you could at least boop the "allow tests to run" button that'd be great :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks for updating it!
@@ -37,7 +51,7 @@ class DatabricksODBCEngineSpec(BaseEngineSpec): | |||
# the syntax for the ODBC engine is identical to the Hive one, so | |||
# we can reuse the expressions from `HiveEngineSpec` | |||
# pylint: disable=protected-access | |||
_time_grain_expressions = HiveEngineSpec._time_grain_expressions | |||
_time_grain_expressions = DatabricksHiveEngineSpec._time_grain_expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also define time_grain_expressions
outside the classes, and then reuse it in both classes, I think it might be clearer:
time_grain_expression = { ... }
class DatabricksHiveEngineSpec(HiveEngineSpec):
_time_grain_expressions = time_grain_expressions
...
class DatabricksODBCEngineSpec(BaseEngineSpec):
_time_grain_expressions = time_grain_expressions
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good here! :D
Codecov Report
@@ Coverage Diff @@
## master #18690 +/- ##
==========================================
+ Coverage 66.28% 66.40% +0.12%
==========================================
Files 1605 1619 +14
Lines 62863 62940 +77
Branches 6341 6341
==========================================
+ Hits 41666 41798 +132
+ Misses 19545 19490 -55
Partials 1652 1652
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@betodealmeida Sorry, gentle ping to boop the "approve running tests" button. 🙇 |
FYI @thomasdesr CI tests started |
a982006
to
df0f950
Compare
Okay I've now gone through https://github.com/apache/superset/blob/master/.github/workflows/superset-python-misc.yml and validated that while a few things are still failing locally they don't seem to be related to this change. Hopefully this should be the last time I need to ask for a boop xD (cc @villebro 🙇) |
Just booped it! :) |
Merged! Thanks for the PR, @thomasdesr! |
* Fix the time grain expressions for Spark >= 2.3.0 Spark removed date format string 'u' in Spark 3.0. Switch to using date_trunc which has been around since 2.3 * Review: Pull out time_grain_expressoins into its own thing (cherry picked from commit 03b2b06)
SUMMARY
Spark removed date format string 'u' in Spark 3.0 [0] so the grain expressions need to be updated because ones that depend on that format string are failing 😢. I switched the behavior to use
date_trunc
which has been around since 2.3 [1] based on how the Athena db engine spec handles this [2].I think its probably fine to raise the minimum supported version to 2.3 because this is a "Databricks" db_engine_spec and Databricks hasn't publicly supported something that old since ~Jan 2020 [3].
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
I can't reproduce the old behavior (I don't have that old a spark cluster xD) but here's a table of timestamps from a current version of spark run through these same expressions:
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION