Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Save column data into json_metadata for all Query executions #20059

Merged
merged 12 commits into from
May 18, 2022

Conversation

hughhhh
Copy link
Member

@hughhhh hughhhh commented May 13, 2022

SUMMARY

Anytime a user executes a query in Sqllab we want to store columns data inside the new json_metadata field to allow us leverage this information whenever a user may want to run this same query inside Explore.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@hughhhh hughhhh changed the title Save-col-on-query feat: Save column data into json_metadata for all Query executions May 13, 2022
@hughhhh hughhhh marked this pull request as ready for review May 16, 2022 16:35
@ktmud
Copy link
Member

ktmud commented May 16, 2022

Is there a reason not to reuse the ExtraJSONMixin?

@hughhhh
Copy link
Member Author

hughhhh commented May 16, 2022

Is there a reason not to reuse the ExtraJSONMixin?

I'm using the extra mixin here

https://github.com/apache/superset/pull/20059/files#diff-68d044828e32dbc106f7cece56d7aeeccbe2eb455cf33307df4fc9d07dfb1fd5R67

@codecov
Copy link

codecov bot commented May 16, 2022

Codecov Report

Merging #20059 (7b7d7e5) into master (ddc01ea) will decrease coverage by 11.88%.
The diff coverage is 73.13%.

❗ Current head 7b7d7e5 differs from pull request most recent head 6af71ee. Consider uploading reports for the commit 6af71ee to get more accurate results

@@             Coverage Diff             @@
##           master   #20059       +/-   ##
===========================================
- Coverage   66.45%   54.56%   -11.89%     
===========================================
  Files        1721     1721               
  Lines       64479    64476        -3     
  Branches     6795     6794        -1     
===========================================
- Hits        42852    35184     -7668     
- Misses      19897    27564     +7667     
+ Partials     1730     1728        -2     
Flag Coverage Δ
hive 53.70% <65.00%> (+0.01%) ⬆️
mysql ?
postgres ?
presto 53.56% <65.00%> (+0.01%) ⬆️
python 58.00% <65.00%> (-24.62%) ⬇️
sqlite ?
unit 49.40% <47.50%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../explore/components/ExploreViewContainer/index.jsx 52.57% <ø> (+0.46%) ⬆️
...rset-frontend/src/explore/exploreUtils/formData.ts 85.71% <ø> (-3.18%) ⬇️
superset/importexport/api.py 100.00% <ø> (ø)
superset/sqllab/sqllab_execution_context.py 90.55% <ø> (ø)
superset/databases/commands/test_connection.py 32.43% <7.14%> (-67.57%) ⬇️
...-frontend/src/components/AlteredSliceTag/index.jsx 88.57% <85.18%> (+1.33%) ⬆️
superset/sqllab/execution_context_convertor.py 81.48% <92.30%> (-5.48%) ⬇️
superset/queries/dao.py 100.00% <100.00%> (ø)
superset/sql_lab.py 79.15% <100.00%> (-2.71%) ⬇️
superset/sqllab/command.py 86.08% <100.00%> (+0.86%) ⬆️
... and 279 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ddc01ea...6af71ee. Read the comment docs.

# save payload into query object
db.session.add(query)
query.set_extra_json_key("columns", columns)
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do these have return statement?

def save_metadata(query: Query, query_payload: str) -> None:
# parse payload
try:
payload: Dict[str, Any] = json.loads(query_payload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we are dumping a json then reloading it within the same process doesn't bod well for me. It's probably also not a great idea to passing strings around between functions.

Can we refactor _execution_context_convertor to return the raw dict instead? (Maybe add a new public function?)

Copy link
Member

@ktmud ktmud May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but can we maybe update ExecutionContextConvertor.to_payload to return Python dict and only do JSON dumps in ExecuteSqlCommand.run? Looks like to_payload is only used in this one place.

Or better yet, replace json_success in _create_response_from_execution_context with self.json_response(...), which dumps dict to strings automatically.

We should do JSON serialization only when we are very close to sending it to the response.

cc @ofekisr

@ktmud
Copy link
Member

ktmud commented May 16, 2022

My apologies. The fact that the the command is called save_metadata made me think the ExtraJSONMixin was not used. Not sure if we need to change it , but it's worth checking what other modules are doing for similar operations.

"payload": self._execution_context_convertor.to_payload(
self._execution_context, status
),
"payload": self._execution_context_convertor.serialize_payload(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe just let the command return raw dict here, too, and use self.json_response in whatever view handler the result may be used, so that json.dumps in the executor itself can be cleaned up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but there is specific logic in the json.dumps that needs the status to understand how to serialize. I don't see a clean way to do this unless we move all this logic outside of the command.

Can we revisit this in another refactor ticket?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think using the shared .json_response

@hughhhh hughhhh merged commit 660af40 into apache:master May 18, 2022
philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
…pache#20059)

* add save_metadata function to QueryDAO

* use set_extra_json_key

* added test

* Update queries_test.py

* fix pylint

* add to session

* add to session

* refactor

* forgot the return
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants