feat: Save column data into json_metadata for all Query executions #20059

hughhhh · 2022-05-13T16:55:43Z

SUMMARY

Anytime a user executes a query in Sqllab we want to store columns data inside the new json_metadata field to allow us leverage this information whenever a user may want to run this same query inside Explore.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

…-col-on-query

ktmud · 2022-05-16T18:17:19Z

Is there a reason not to reuse the ExtraJSONMixin？

into save-col-on-query

hughhhh · 2022-05-16T19:40:51Z

Is there a reason not to reuse the ExtraJSONMixin？

I'm using the extra mixin here

https://github.com/apache/superset/pull/20059/files#diff-68d044828e32dbc106f7cece56d7aeeccbe2eb455cf33307df4fc9d07dfb1fd5R67

codecov · 2022-05-16T20:37:24Z

Codecov Report

Merging #20059 (7b7d7e5) into master (ddc01ea) will decrease coverage by 11.88%.
The diff coverage is 73.13%.

❗ Current head 7b7d7e5 differs from pull request most recent head 6af71ee. Consider uploading reports for the commit 6af71ee to get more accurate results

@@             Coverage Diff             @@
##           master   #20059       +/-   ##
===========================================
- Coverage   66.45%   54.56%   -11.89%     
===========================================
  Files        1721     1721               
  Lines       64479    64476        -3     
  Branches     6795     6794        -1     
===========================================
- Hits        42852    35184     -7668     
- Misses      19897    27564     +7667     
+ Partials     1730     1728        -2

Flag	Coverage Δ
hive	`53.70% <65.00%> (+0.01%)`	⬆️
mysql	`?`
postgres	`?`
presto	`53.56% <65.00%> (+0.01%)`	⬆️
python	`58.00% <65.00%> (-24.62%)`	⬇️
sqlite	`?`
unit	`49.40% <47.50%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
.../explore/components/ExploreViewContainer/index.jsx	`52.57% <ø> (+0.46%)`	⬆️
...rset-frontend/src/explore/exploreUtils/formData.ts	`85.71% <ø> (-3.18%)`	⬇️
superset/importexport/api.py	`100.00% <ø> (ø)`
superset/sqllab/sqllab_execution_context.py	`90.55% <ø> (ø)`
superset/databases/commands/test_connection.py	`32.43% <7.14%> (-67.57%)`	⬇️
...-frontend/src/components/AlteredSliceTag/index.jsx	`88.57% <85.18%> (+1.33%)`	⬆️
superset/sqllab/execution_context_convertor.py	`81.48% <92.30%> (-5.48%)`	⬇️
superset/queries/dao.py	`100.00% <100.00%> (ø)`
superset/sql_lab.py	`79.15% <100.00%> (-2.71%)`	⬇️
superset/sqllab/command.py	`86.08% <100.00%> (+0.86%)`	⬆️
... and 279 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddc01ea...6af71ee. Read the comment docs.

AAfghahi · 2022-05-16T21:43:03Z

superset/queries/dao.py

+        # save payload into query object
+        db.session.add(query)
+        query.set_extra_json_key("columns", columns)
+        return None


why do these have return statement?

ktmud · 2022-05-16T22:26:36Z

superset/queries/dao.py

+    def save_metadata(query: Query, query_payload: str) -> None:
+        # parse payload
+        try:
+            payload: Dict[str, Any] = json.loads(query_payload)


The fact that we are dumping a json then reloading it within the same process doesn't bod well for me. It's probably also not a great idea to passing strings around between functions.

Can we refactor _execution_context_convertor to return the raw dict instead? (Maybe add a new public function?)

Yeah, but can we maybe update ExecutionContextConvertor.to_payload to return Python dict and only do JSON dumps in ExecuteSqlCommand.run? Looks like to_payload is only used in this one place.

Or better yet, replace json_success in _create_response_from_execution_context with self.json_response(...), which dumps dict to strings automatically.

We should do JSON serialization only when we are very close to sending it to the response.

cc @ofekisr

ktmud · 2022-05-16T22:30:49Z

My apologies. The fact that the the command is called save_metadata made me think the ExtraJSONMixin was not used. Not sure if we need to change it , but it's worth checking what other modules are doing for similar operations.

into save-col-on-query

ktmud · 2022-05-18T01:34:48Z

superset/sqllab/command.py

-                "payload": self._execution_context_convertor.to_payload(
-                    self._execution_context, status
-                ),
+                "payload": self._execution_context_convertor.serialize_payload(),


Can we maybe just let the command return raw dict here, too, and use self.json_response in whatever view handler the result may be used, so that json.dumps in the executor itself can be cleaned up?

but there is specific logic in the json.dumps that needs the status to understand how to serialize. I don't see a clean way to do this unless we move all this logic outside of the command.

Can we revisit this in another refactor ticket?

Sounds good.

ktmud

I still think using the shared .json_response

…pache#20059) * add save_metadata function to QueryDAO * use set_extra_json_key * added test * Update queries_test.py * fix pylint * add to session * add to session * refactor * forgot the return

add save_metadata function to QueryDAO

4a93a3c

pull-request-size bot added the size/M label May 13, 2022

hughhhh changed the title ~~Save-col-on-query~~ feat: Save column data into json_metadata for all Query executions May 13, 2022

hughhhh added 4 commits May 16, 2022 15:29

use set_extra_json_key

30a613b

Merge branch 'master' of https://github.com/apache/superset into save…

ba9506c

…-col-on-query

added test

d34a054

Update queries_test.py

ee932e0

hughhhh marked this pull request as ready for review May 16, 2022 16:35

hughhhh requested review from AAfghahi, eschutho, ktmud, betodealmeida and pkdotson May 16, 2022 17:19

hughhhh added 2 commits May 16, 2022 19:28

fix pylint

8ff1349

Merge branch 'save-col-on-query' of https://github.com/hve-labs/superset

9396d3b

into save-col-on-query

hughhhh added 2 commits May 16, 2022 20:29

add to session

3618c8a

add to session

c806347

AAfghahi reviewed May 16, 2022

View reviewed changes

ktmud reviewed May 16, 2022

View reviewed changes

hughhhh added 2 commits May 17, 2022 18:37

Merge branch 'save-col-on-query' of https://github.com/hve-labs/superset

62669bc

into save-col-on-query

refactor

4c7549d

pull-request-size bot added size/L and removed size/M labels May 17, 2022

hughhhh force-pushed the save-col-on-query branch from 278a88f to df128da Compare May 17, 2022 20:50

forgot the return

6af71ee

hughhhh force-pushed the save-col-on-query branch from df128da to 6af71ee Compare May 17, 2022 20:58

ktmud reviewed May 18, 2022

View reviewed changes

ktmud approved these changes May 18, 2022

View reviewed changes

hughhhh merged commit 660af40 into apache:master May 18, 2022

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Save column data into json_metadata for all Query executions #20059

feat: Save column data into json_metadata for all Query executions #20059

hughhhh commented May 13, 2022 •

edited

Loading

ktmud commented May 16, 2022

hughhhh commented May 16, 2022

codecov bot commented May 16, 2022 •

edited

Loading

AAfghahi May 16, 2022

ktmud May 16, 2022

ktmud May 17, 2022 •

edited

Loading

ktmud commented May 16, 2022

ktmud May 18, 2022

hughhhh May 18, 2022

ktmud May 18, 2022

ktmud left a comment

feat: Save column data into json_metadata for all Query executions #20059

feat: Save column data into json_metadata for all Query executions #20059

Conversation

hughhhh commented May 13, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

ktmud commented May 16, 2022

hughhhh commented May 16, 2022

codecov bot commented May 16, 2022 • edited Loading

Codecov Report

AAfghahi May 16, 2022

Choose a reason for hiding this comment

ktmud May 16, 2022

Choose a reason for hiding this comment

ktmud May 17, 2022 • edited Loading

Choose a reason for hiding this comment

ktmud commented May 16, 2022

ktmud May 18, 2022

Choose a reason for hiding this comment

hughhhh May 18, 2022

Choose a reason for hiding this comment

ktmud May 18, 2022

Choose a reason for hiding this comment

ktmud left a comment

Choose a reason for hiding this comment

hughhhh commented May 13, 2022 •

edited

Loading

codecov bot commented May 16, 2022 •

edited

Loading

ktmud May 17, 2022 •

edited

Loading