fix: Update time grain expressions for Spark >= 3.x #18690

thomasdesr · 2022-02-12T09:08:58Z

SUMMARY

Spark removed date format string 'u' in Spark 3.0 [0] so the grain expressions need to be updated because ones that depend on that format string are failing 😢. I switched the behavior to use date_trunc which has been around since 2.3 [1] based on how the Athena db engine spec handles this [2].

I think its probably fine to raise the minimum supported version to 2.3 because this is a "Databricks" db_engine_spec and Databricks hasn't publicly supported something that old since ~Jan 2020 [3].

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

I can't reproduce the old behavior (I don't have that old a spark cluster xD) but here's a table of timestamps from a current version of spark run through these same expressions:

_timestamp	2021-02-08T23:58:02	2021-02-08T23:58:41	2021-02-08T23:59:31
PT1S	2021-02-08T23:58:02	2021-02-08T23:58:41	2021-02-08T23:59:31
PT1M	2021-02-08T23:58:00	2021-02-08T23:58:00	2021-02-08T23:59:00
PT1H	2021-02-08T23:00:00	2021-02-08T23:00:00	2021-02-08T23:00:00
P1D	2021-02-08T00:00:00	2021-02-08T00:00:00	2021-02-08T00:00:00
P1W	2021-02-08T00:00:00	2021-02-08T00:00:00	2021-02-08T00:00:00
P1M	2021-02-01T00:00:00	2021-02-01T00:00:00	2021-02-01T00:00:00
P3M	2021-01-01T00:00:00	2021-01-01T00:00:00	2021-01-01T00:00:00
P1Y	2021-01-01T00:00:00	2021-01-01T00:00:00	2021-01-01T00:00:00
P1W/1970-01-03T00:00:00Z	2021-02-13T00:00:00	2021-02-13T00:00:00	2021-02-13T00:00:00
1969-12-28T00:00:00Z/P1W	2021-02-07T00:00:00	2021-02-07T00:00:00	2021-02-07T00:00:00

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

Spark removed date format string 'u' in Spark 3.0. Switch to using date_trunc which has been around since 2.3

thomasdesr · 2022-02-16T23:48:04Z

cc @betodealmeida (who seems to have done most of the work on the databricks/spark engine spec)

If you could at least boop the "allow tests to run" button that'd be great :D

betodealmeida

This looks great, thanks for updating it!

betodealmeida · 2022-02-17T00:13:01Z

superset/db_engine_specs/databricks.py

@@ -37,7 +51,7 @@ class DatabricksODBCEngineSpec(BaseEngineSpec):
    # the syntax for the ODBC engine is identical to the Hive one, so
    # we can reuse the expressions from `HiveEngineSpec`
    # pylint: disable=protected-access
-    _time_grain_expressions = HiveEngineSpec._time_grain_expressions
+    _time_grain_expressions = DatabricksHiveEngineSpec._time_grain_expressions


You can also define time_grain_expressions outside the classes, and then reuse it in both classes, I think it might be clearer:

time_grain_expression = { ... } class DatabricksHiveEngineSpec(HiveEngineSpec): _time_grain_expressions = time_grain_expressions ... class DatabricksODBCEngineSpec(BaseEngineSpec): _time_grain_expressions = time_grain_expressions ...

Sounds good here! :D

codecov · 2022-02-17T00:24:13Z

Codecov Report

Merging #18690 (d2a8fd7) into master (225015f) will increase coverage by 0.12%.
The diff coverage is 100.00%.

❗ Current head d2a8fd7 differs from pull request most recent head df0f950. Consider uploading reports for the commit df0f950 to get more accurate results

@@            Coverage Diff             @@
##           master   #18690      +/-   ##
==========================================
+ Coverage   66.28%   66.40%   +0.12%     
==========================================
  Files        1605     1619      +14     
  Lines       62863    62940      +77     
  Branches     6341     6341              
==========================================
+ Hits        41666    41798     +132     
+ Misses      19545    19490      -55     
  Partials     1652     1652

Flag	Coverage Δ
hive	`52.28% <100.00%> (+0.14%)`	⬆️
mysql	`81.55% <100.00%> (+0.22%)`	⬆️
postgres	`81.60% <100.00%> (+0.22%)`	⬆️
presto	`52.13% <100.00%> (+0.14%)`	⬆️
python	`82.03% <100.00%> (+0.22%)`	⬆️
sqlite	`81.29% <100.00%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/db_engine_specs/databricks.py	`90.90% <100.00%> (+0.90%)`	⬆️
superset/key_value/commands/create.py	`80.95% <0.00%> (-3.05%)`	⬇️
superset/sql_parse.py	`98.61% <0.00%> (-0.87%)`	⬇️
superset/models/core.py	`88.91% <0.00%> (-0.19%)`	⬇️
superset/sql_lab.py	`81.64% <0.00%> (ø)`
superset/key_value/utils.py	`100.00% <0.00%> (ø)`
superset/dashboards/filter_state/api.py	`100.00% <0.00%> (ø)`
superset/explore/form_data/commands/parameters.py	`100.00% <0.00%> (ø)`
superset/utils/pandas_postprocessing.py
superset/utils/pandas_postprocessing/compare.py	`88.88% <0.00%> (ø)`
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 225015f...df0f950. Read the comment docs.

thomasdesr · 2022-03-07T01:54:13Z

@betodealmeida Sorry, gentle ping to boop the "approve running tests" button. 🙇

villebro · 2022-03-07T08:41:25Z

FYI @thomasdesr CI tests started

thomasdesr · 2022-03-08T18:43:44Z

Okay I've now gone through https://github.com/apache/superset/blob/master/.github/workflows/superset-python-misc.yml and validated that while a few things are still failing locally they don't seem to be related to this change. Hopefully this should be the last time I need to ask for a boop xD

(cc @villebro 🙇)

betodealmeida · 2022-03-08T18:56:39Z

Just booped it! :)

betodealmeida · 2022-03-08T21:21:25Z

Merged! Thanks for the PR, @thomasdesr!

* Fix the time grain expressions for Spark >= 2.3.0 Spark removed date format string 'u' in Spark 3.0. Switch to using date_trunc which has been around since 2.3 * Review: Pull out time_grain_expressoins into its own thing (cherry picked from commit 03b2b06)

Fix the time grain expressions for Spark >= 2.3.0

a25eb28

Spark removed date format string 'u' in Spark 3.0. Switch to using date_trunc which has been around since 2.3

pull-request-size bot added the size/S label Feb 12, 2022

thomasdesr changed the title ~~fix: Update time grain expressions for Spark >= 2.3.0~~ fix: Update time grain expressions for Spark >= 3.x Feb 16, 2022

betodealmeida approved these changes Feb 17, 2022

View reviewed changes

thomasdesr requested a review from betodealmeida February 17, 2022 05:01

Review: Pull out time_grain_expressoins into its own thing

df0f950

thomasdesr force-pushed the fix-spark-time-grain-expressions branch from a982006 to df0f950 Compare March 8, 2022 18:41

betodealmeida merged commit 03b2b06 into apache:master Mar 8, 2022

villebro added the lts-v1 label Mar 29, 2022

SusurHe mentioned this pull request Jun 22, 2022

feat: add Spark Sql DB engine spec and support Spark 3.x #20462

Merged

9 tasks

mistercrunch added 🍒 1.5.0 🍒 1.5.1 🍒 1.5.2 🍒 1.5.3 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Update time grain expressions for Spark >= 3.x #18690

fix: Update time grain expressions for Spark >= 3.x #18690

thomasdesr commented Feb 12, 2022 •

edited

Loading

thomasdesr commented Feb 16, 2022

betodealmeida left a comment

betodealmeida Feb 17, 2022

thomasdesr Feb 17, 2022

codecov bot commented Feb 17, 2022 •

edited

Loading

thomasdesr commented Mar 7, 2022 •

edited

Loading

villebro commented Mar 7, 2022

thomasdesr commented Mar 8, 2022

betodealmeida commented Mar 8, 2022

betodealmeida commented Mar 8, 2022

fix: Update time grain expressions for Spark >= 3.x #18690

fix: Update time grain expressions for Spark >= 3.x #18690

Conversation

thomasdesr commented Feb 12, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

thomasdesr commented Feb 16, 2022

betodealmeida left a comment

Choose a reason for hiding this comment

betodealmeida Feb 17, 2022

Choose a reason for hiding this comment

thomasdesr Feb 17, 2022

Choose a reason for hiding this comment

codecov bot commented Feb 17, 2022 • edited Loading

Codecov Report

thomasdesr commented Mar 7, 2022 • edited Loading

villebro commented Mar 7, 2022

thomasdesr commented Mar 8, 2022

betodealmeida commented Mar 8, 2022

betodealmeida commented Mar 8, 2022

thomasdesr commented Feb 12, 2022 •

edited

Loading

codecov bot commented Feb 17, 2022 •

edited

Loading

thomasdesr commented Mar 7, 2022 •

edited

Loading