Skip to content

Comments

fix: ODPS(MaxCompute)data source table preview failed#32556

Draft
zhutong6688 wants to merge 12 commits intoapache:masterfrom
zhutong6688:fix-new-odps-partition-bug-32301
Draft

fix: ODPS(MaxCompute)data source table preview failed#32556
zhutong6688 wants to merge 12 commits intoapache:masterfrom
zhutong6688:fix-new-odps-partition-bug-32301

Conversation

@zhutong6688
Copy link

SUMMARY

I fixed an error when using sqllab to preview data after configuring the odps data source in superset, and added partition restrictions for previewing SQL
Unlike before, this is a separate collection of the logic of ODPS into a single class(before #32310

fixes: #32301

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

This is a preview of the odps table data before the repair:
image

This is a preview of the repaired ODPS table data:
5f49bcaba6d6cd31bdfdfae4ada6970
d2de24aca26b1c8560deaabbf3d1144
8aeca80279b56aaf4b3ee4ffeec96d5
In addition, when using other data sources, there is no problem with testing and it does not affect other data sources

TESTING INSTRUCTIONS

  1. First, configure the ODPS data source
  2. Create a new SQLlab page in the "SQL" column, select the database Schema
  3. Select partitioned tables, non partitioned tables, and multi partitioned tables in sequence to test the correctness of data preview
  4. Choose another data source, such as an example data source like SQLite, and follow the same steps as above to verify the validity of the data preview

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the api Related to the REST API label Mar 10, 2025
@dosubot dosubot bot added data:connect Namespace | Anything related to db connections / integrations sqllab Namespace | Anything related to the SQL Lab labels Mar 10, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Fix Detected
Design Database-specific logic in generic DAO ▹ view
Design Tight coupling with concrete engine implementation ▹ view
Performance Redundant Column Metadata Fetch ▹ view
Security SQL Injection Risk in Partition Query Construction ▹ view
Functionality Ineffective Partition Filtering ▹ view
Logging Unnecessary Debug Print Statement ▹ view
Readability Undocumented Complex Regex Pattern ▹ view
Functionality Overly Strict ODPS URI Pattern Matching ▹ view
Files scanned
File Path Reviewed
superset/daos/database.py
superset/db_engine_specs/odps.py
superset/sql/parse.py
superset/databases/api.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

  • You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.
  • You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.
  • Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.
  • On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
  • Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

  • Check out our docs on how you can make Korbit work best for you and your team.
  • Customize Korbit for your organization through the Korbit Console.

Feedback and Support

A print was omitted and deleted during development, and this time it is to delete the print
@zhutong6688
Copy link
Author

Just assign this task to me, thank you

@sadpandajoe sadpandajoe changed the title Fix: ODPS data source table preview failed(#32301)-New fix: ODPS data source table preview failed Mar 10, 2025
@sadpandajoe sadpandajoe requested a review from justinpark March 10, 2025 17:27
@codecov
Copy link

codecov bot commented Mar 11, 2025

Codecov Report

❌ Patch coverage is 46.29630% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.34%. Comparing base (76d897e) to head (9b9a7f0).
⚠️ Report is 2361 commits behind head on master.

Files with missing lines Patch % Lines
superset/db_engine_specs/odps.py 34.37% 42 Missing ⚠️
superset/daos/database.py 52.00% 12 Missing ⚠️
superset/sql/parse.py 70.00% 3 Missing ⚠️
superset/databases/api.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #32556       +/-   ##
===========================================
+ Coverage   60.48%   83.34%   +22.85%     
===========================================
  Files        1931      550     -1381     
  Lines       76236    39545    -36691     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    32957    -13157     
+ Misses      28017     6588    -21429     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.41% <34.25%> (-0.75%) ⬇️
javascript ?
mysql 75.56% <34.25%> (?)
postgres 75.63% <34.25%> (?)
presto 52.88% <34.25%> (-0.92%) ⬇️
python 83.34% <46.29%> (+19.83%) ⬆️
sqlite 75.15% <34.25%> (?)
unit 61.36% <46.29%> (+3.73%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hainenber
Copy link
Contributor

@zhutong6688 Workday happened and I did missed your updates, my mistake! I've assigned this to you and left 1 review. I'll let folks with Oracle knowledge assessing added LoCs.

@zhutong6688
Copy link
Author

zhutong6688 commented Mar 13, 2025

@justinpark Please kindly agree to proceed with the inspection
image

@zhutong6688
Copy link
Author

zhutong6688 commented Mar 13, 2025

@justinpark @hainenber Please approve the execution of the workflow check

@zhutong6688
Copy link
Author

zhutong6688 commented Mar 16, 2025

@zhutong6688 Workday happened and I did missed your updates, my mistake! I've assigned this to you and left 1 review. I'll let folks with Oracle knowledge assessing added LoCs.

I am extending the support for Alibaba Cloud's MaxCompute data source, also known as ODPS, Not Oracle

@zhutong6688 zhutong6688 changed the title fix: ODPS data source table preview failed fix: ODPS(MaxCompute) data source table preview failed Mar 17, 2025
@zhutong6688 zhutong6688 changed the title fix: ODPS(MaxCompute) data source table preview failed fix: ODPS(MaxCompute)data source table preview failed Mar 17, 2025
@yujun2021
Copy link

@hainenber
I'm very excited to see this fixed functionality in a new version as soon as possible.

@hainenber
Copy link
Contributor

hainenber commented Mar 18, 2025

The credit is all to @zhutong6688. I'll try my best to assist the author to have the change qualified to be merged.

@zhutong6688
Copy link
Author

zhutong6688 commented Mar 19, 2025

@hainenber @justinpark I have just resolved the conflict that was detected during the inspection below
requirements/base.in

@yujun2021
Copy link

The credit is all to @zhutong6688. I'll try my best to assist the author to have the change qualified to be merged.

thanks @zhutong6688 @hainenber

@zhutong6688
Copy link
Author

@hainenber @justinpark Please approve the execution of the workflow check

@rusackas rusackas requested a review from hainenber March 21, 2025 14:47
@yujun2021
Copy link

@hainenber @justinpark Please approve the execution of the workflow check

I'm curious about how this case is going.

@rusackas
Copy link
Member

rusackas commented Apr 2, 2025

Looks like CI passed... we just need a qualified reviewer to take a fresh pass at it.

@rusackas
Copy link
Member

@hainenber @justinpark @mistercrunch anyone feel qualified to give this a proper review and get it merged?

# -> https://github.com/aio-libs/async-timeout/blob/master/CHANGES.rst#500-2024-10-31
async_timeout>=4.0.0,<5.0.0

pyodps>=0.12.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh very few people use odps AFAIK, this dependency should be treated like the other analytics databases here: https://github.com/apache/superset/blob/master/pyproject.toml#L160-L188

return ssh_tunnel

@classmethod
def is_odps_partitioned_table(
Copy link
Member

@mistercrunch mistercrunch May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the database-engine-specific code belongs in this package -> https://github.com/apache/superset/tree/master/superset/db_engine_specs

I see you you're setting this up as part of your PR, but this odps-specific method/logic belongs in that module.



@dataclass(eq=True, frozen=True)
class Partition:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this module is getting refactored as we speak by @betodealmeida , he can probably help making sure this lands safely

# loop over all subclasses of BaseEngineSpec
for engine in load_engine_specs():
if engine is not BaseEngineSpec:
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for unit tests, if they dependencies on external drivers (pyodps in this case) , there's a decorator for conditional unit tests that run only when the dependency is there. Currently we only install some of the database drivers in our CI builds (the ones that end up in development.txt). Most likely we don't want to install all possible drivers and run unit tests against it because there's just too many databases and drivers for the core maintainers to manage... Many drivers are finicky, and some drivers/integrations only few people care about.

Anyhow, not a bad thing to have unit tests in the repo, but won't run as part of our CI. Maybe in your fork/CI you can run tests that are odps-specific.

@rusackas
Copy link
Member

Looks like we have some conflicts and comments that haven't been addressed yet. Setting to draft for now, but we'd love to se this marked "ready for review" and merged!

@rusackas rusackas marked this pull request as draft August 12, 2025 20:46
@helmiazizm
Copy link

Bumping this up as the resolution of this PR will help us a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API data:connect Namespace | Anything related to db connections / integrations review:draft size/L sqllab Namespace | Anything related to the SQL Lab

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ODPS data source table preview failed

7 participants