fix: ODPS (MaxCompute) data source table preview failed#38174
fix: ODPS (MaxCompute) data source table preview failed#38174
Conversation
Sequence DiagramShows how the table metadata endpoint detects ODPS partitioned tables, constructs a Partition object, and routes metadata/preview generation to OdpsEngineSpec which injects a partition filter into the SELECT used for preview. sequenceDiagram
participant Client as SQLLab/UI
participant API as Databases API (table_metadata)
participant DAO as DatabaseDAO.is_odps_partitioned_table
participant Engine as OdpsEngineSpec
participant DB as Database (sqla / pyodps)
Client->>API: GET /table_metadata?name=table...
API->>DAO: is_odps_partitioned_table(database, table_name)
DAO-->>API: (is_partitioned=True, partition_fields=["p1",...])
API->>Engine: get_table_metadata(database, table, Partition(...))
Engine->>DB: get columns/metadata & select_star(...)
Engine->>DB: select_star includes partition WHERE for ODPS partitioned table
DB-->>Engine: table metadata + selectStar SQL
Engine-->>API: TableMetadataResponse (with partition-aware selectStar)
API-->>Client: 200 OK (table metadata and preview SQL)
Generated by CodeAnt AI |
superset/db_engine_specs/odps.py
Outdated
| from superset.sql.parse import Partition, SQLScript | ||
| from superset.sql_parse import Table |
There was a problem hiding this comment.
Suggestion: The module incorrectly imports Table from superset.sql_parse, which does not exist, causing an ImportError when loading the ODPS engine spec; Table should be imported from superset.sql.parse along with Partition and SQLScript. [logic error]
Severity Level: Critical 🚨
- ❌ DatabaseRestApi import fails, breaking database REST endpoints.
- ❌ ODPS engine spec unavailable for SQLLab table metadata.
- ⚠️ Any feature importing OdpsEngineSpec crashes on startup.| from superset.sql.parse import Partition, SQLScript | |
| from superset.sql_parse import Table | |
| from superset.sql.parse import Partition, SQLScript, Table |
Steps of Reproduction ✅
1. Start the Superset application or any context that imports the database REST API; this
imports `superset/databases/api.py` (see file header and imports at
`superset/databases/api.py:1-115`).
2. `DatabaseRestApi` imports the ODPS engine spec via `from superset.db_engine_specs.odps
import OdpsEngineSpec` at `superset/databases/api.py:114`.
3. Python loads `superset/db_engine_specs/odps.py`, executing its top-level imports,
including `from superset.sql_parse import Table` at
`superset/db_engine_specs/odps.py:35-36`.
4. Since there is no `superset/sql_parse.py` module on disk (confirmed by the missing file
error when reading `/workspace/superset/superset/sql_parse.py` and the actual `Table`
dataclass being defined in `superset/sql/parse.py:282-291`), module import fails with
`ModuleNotFoundError: No module named 'superset.sql_parse'`, preventing `DatabaseRestApi`
(and thus `/api/v1/database/*` endpoints and ODPS support) from loading.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/odps.py
**Line:** 35:36
**Comment:**
*Logic Error: The module incorrectly imports `Table` from `superset.sql_parse`, which does not exist, causing an `ImportError` when loading the ODPS engine spec; `Table` should be imported from `superset.sql.parse` along with `Partition` and `SQLScript`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Code Review Agent Run #3ebe85
Actionable Suggestions - 2
-
requirements/base.in - 1
- Dependency Versioning Inconsistency · Line 53-53
-
superset/db_engine_specs/odps.py - 1
- Infinite Recursion Bug · Line 60-60
Additional Suggestions - 3
-
superset/sql/parse.py - 2
-
Unnecessary and fragile __eq__ override · Line 351-352The custom __eq__ method compares string representations, which is unnecessary (dataclass handles it) and risky if 'other' isn't str-able, potentially causing TypeError. Rely on the default field comparison instead.
Code suggestion
@@ -351,3 +351,0 @@ - def __eq__(self, other: Any) -> bool: - return str(self) == str(other) -
-
Docstring typos and inaccuracies · Line 327-332The docstring has a typo ('partition_comlumn' instead of 'partition_column') and the example uses 'ispartitioned_table' instead of the actual field name 'is_partitioned_table', which could confuse readers.
Code suggestion
@@ -327,7 +327,7 @@ - """ - Partition object, with two attribute keys: - ispartitioned_table and partition_comlumn, - used to provide partition information - Here is an example of an object: - {"ispartitioned_table":true,"partition_column":["month","day"]} - """ + """ + Partition object, with two attribute keys: + is_partitioned_table and partition_column, + used to provide partition information + Here is an example of an object: + {"is_partitioned_table":true,"partition_column":["month","day"]} + """
-
-
requirements/base.txt - 1
-
Missing dependency source comment · Line 488-488The pyodps dependency line lacks the standard '# via -r requirements/base.in' comment that pip-compile adds for dependencies sourced from base.in, breaking consistency with other entries in the file.
Code suggestion
@@ -488,1 +488,2 @@ -pyodps==0.12.2 +pyodps==0.12.2 +# via -r requirements/base.in
-
Filtered by Review Rules
Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.
-
superset/db_engine_specs/odps.py - 1
- Infinite recursion in method implementation · Line 60-60
Review Details
-
Files reviewed - 8 · Commit Range:
3a7e8d4..3a7e8d4- requirements/base.in
- requirements/base.txt
- requirements/development.txt
- superset/daos/database.py
- superset/databases/api.py
- superset/db_engine_specs/odps.py
- superset/sql/parse.py
- tests/integration_tests/db_engine_specs/base_engine_spec_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
requirements/base.in
Outdated
| setuptools<81 | ||
|
|
||
| # ODPS (MaxCompute) data source support | ||
| pyodps>=0.12.2 |
There was a problem hiding this comment.
The new pyodps dependency uses only a lower bound (>=0.12.2), which differs from the pattern in this file where other dependencies include upper bounds to avoid breakage (e.g., async_timeout>=4.0.0,<5.0.0). Consider adding an upper bound like <0.13.0 to maintain consistency and reduce risk of future incompatibilities.
Code Review Run #3ebe85
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
| :param partition: A Table partition info | ||
| :return: Basic table metadata | ||
| """ | ||
| return cls.get_table_metadata(database, table, partition) |
There was a problem hiding this comment.
The method calls itself recursively, leading to infinite recursion. It should call the base implementation from utils instead.
Code suggestion
Check the AI-generated fix before applying
- from superset.databases.utils import (
- get_col_type,
- get_foreign_keys_metadata,
- get_indexes_metadata,
- )
+ from superset.databases.utils import (
+ get_col_type,
+ get_foreign_keys_metadata,
+ get_indexes_metadata,
+ get_table_metadata,
+ )
@@ -60,1 +60,1 @@
- return cls.get_table_metadata(database, table, partition)
+ return get_table_metadata(database, table)
Code Review Run #3ebe85
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Code Review Agent Run #b4a3aeActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| :param partition: A Table partition info | ||
| :return: Basic table metadata | ||
| """ | ||
| return cls.get_table_metadata(database, table, partition) |
There was a problem hiding this comment.
Suggestion: The OdpsBaseEngineSpec.get_table_metadata classmethod calls cls.get_table_metadata with the same signature it defines, which for OdpsBaseEngineSpec itself results in infinite recursion and a RecursionError whenever it is invoked; it should instead delegate to the base implementation on BaseEngineSpec and ignore the extra partition argument. [logic error]
Severity Level: Major ⚠️
- ❌ Any direct OdpsBaseEngineSpec.get_table_metadata call crashes.
- ⚠️ Blocks reuse of OdpsBaseEngineSpec in future subclasses.
- ⚠️ Risk for test code importing OdpsBaseEngineSpec directly.| return cls.get_table_metadata(database, table, partition) | |
| # Delegate to the base implementation; the partition argument is currently unused. | |
| return super().get_table_metadata(database, table) |
Steps of Reproduction ✅
1. Import the ODPS engine spec base class from the PR code: `from
superset.db_engine_specs.odps import OdpsBaseEngineSpec` (defined at
`superset/db_engine_specs/odps.py:44`).
2. In any Python context (tests, shell, or application code), call
`OdpsBaseEngineSpec.get_table_metadata(database=None, table=None)`; the arguments are not
used before the recursion (`superset/db_engine_specs/odps.py:46-59`).
3. The classmethod body executes `return cls.get_table_metadata(database, table,
partition)` which re-invokes the same `OdpsBaseEngineSpec.get_table_metadata`
implementation with identical arguments (line 59).
4. This self-call repeats with no termination condition until Python's recursion limit is
exceeded, raising a `RecursionError` before any actual metadata lookup can occur.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/odps.py
**Line:** 59:59
**Comment:**
*Logic Error: The `OdpsBaseEngineSpec.get_table_metadata` classmethod calls `cls.get_table_metadata` with the same signature it defines, which for `OdpsBaseEngineSpec` itself results in infinite recursion and a `RecursionError` whenever it is invoked; it should instead delegate to the base implementation on `BaseEngineSpec` and ignore the extra `partition` argument.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
|
||
| if show_cols: | ||
| fields = cls._get_fields(cols) | ||
| full_table_name = cls.quote_table(table, engine.dialect) |
There was a problem hiding this comment.
Suggestion: OdpsEngineSpec.select_star assumes its third argument engine always has a .dialect attribute, but in the generic code path Database.select_star passes a SQLAlchemy Dialect object instead, causing an AttributeError; the method should accept either an engine or a dialect and normalize to a dialect before calling quote_table. [logic error]
Severity Level: Critical 🚨
- ❌ /api/v1/database/<pk>/select_star fails for ODPS.
- ❌ SQLLab cannot generate SELECT * for ODPS tables.
- ⚠️ ODPS engine diverges from BaseEngineSpec.select_star contract.| full_table_name = cls.quote_table(table, engine.dialect) | |
| dialect = getattr(engine, "dialect", engine) | |
| full_table_name = cls.quote_table(table, dialect) |
Steps of Reproduction ✅
1. Configure an ODPS (MaxCompute) database and ensure its backend is `"odps"`, so
`Database.db_engine_spec` resolves to `OdpsEngineSpec` (see
`superset/models/core.py:1021-1038` and `superset/db_engine_specs/odps.py:62-65`).
2. Trigger the `/api/v1/database/<pk>/select_star/<table_name>/` endpoint defined in
`DatabaseRestApi.select_star` (`superset/databases/api.py:1181-1242`) for this ODPS
database, for example from the UI "select star" feature in SQLLab.
3. Inside `DatabaseRestApi.select_star`, Superset calls
`database.select_star(Table(table_name, schema_name, database.get_default_catalog()),
latest_partition=True)` (`superset/databases/api.py:1232-1237`).
4. `Database.select_star` (`superset/models/core.py:825-845`) computes `dialect =
self.get_dialect()` and then calls `self.db_engine_spec.select_star(self, table,
dialect=dialect, limit=..., show_cols=..., indent=..., latest_partition=..., cols=cols)`.
Because `OdpsEngineSpec.select_star` is defined with signature `(database, table, engine,
...)` and no `dialect` keyword (`superset/db_engine_specs/odps.py:129-140`), Python raises
`TypeError: select_star() got an unexpected keyword argument 'dialect'` before reaching
the `engine.dialect` access at lines 166-167.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/odps.py
**Line:** 166:166
**Comment:**
*Logic Error: `OdpsEngineSpec.select_star` assumes its third argument `engine` always has a `.dialect` attribute, but in the generic code path `Database.select_star` passes a SQLAlchemy `Dialect` object instead, causing an `AttributeError`; the method should accept either an engine or a dialect and normalize to a dialect before calling `quote_table`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Code Review Agent Run #ede303Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
When using the ODPS (MaxCompute) data source, previewing partitioned tables in SQLLab would fail because ODPS requires a partition to be specified for partition tables. This PR adds ODPS-specific handling: - New OdpsEngineSpec with partition detection support - Modified select_star to add partition filter for ODPS partition tables - New Partition dataclass in sql/parse.py - New is_odps_partitioned_table method in DatabaseDAO Closes #32301 Co-Authored-By: zhutong6688 <zhutong66@163.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changed import from superset.sql_parse to superset.sql.parse to match the new module location. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add missing engine and engine_name class attributes to OdpsEngineSpec required for the supports_backend() method to work correctly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add validation for external DB engine specs loaded via entry points to ensure they are proper subclasses of BaseEngineSpec. This prevents errors when third-party packages (like pyodps) provide outdated or incompatible engine specs that don't inherit from BaseEngineSpec. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add unit tests for the Partition class to cover __str__ and __eq__ methods and achieve 100% code coverage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Split long log message string in db_engine_specs/__init__.py to stay within the 88-character line limit. Move pyodps==0.12.2 to its proper alphabetical position in requirements/base.txt and remove the manually appended entry from the end of both requirements files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the `from odps import ODPS` import from module level to inside the `is_odps_partitioned_table` method where it is used, wrapped in try/except ImportError. Also move the `OdpsEngineSpec` import in `databases/api.py` to be lazy. Remove pyodps from base requirements since it is an optional dependency (like trino, bigquery, etc). This prevents ModuleNotFoundError when pyodps is not installed, which was causing all CI tests to fail with "Failed to create app". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d8b4eb3 to
aabc26d
Compare
| is_partitioned_table, partition_fields = DatabaseDAO.is_odps_partitioned_table( | ||
| database, table_name | ||
| ) | ||
| try: | ||
| security_manager.raise_for_access(database=database, table=table) | ||
| except SupersetSecurityException as ex: | ||
| # instead of raising 403, raise 404 to hide table existence | ||
| raise TableNotFoundException("No such table") from ex |
There was a problem hiding this comment.
Suggestion: The ODPS partition detection is performed before checking the user's table access, meaning an unauthorized caller can still trigger ODPS API calls (and potentially learn about table existence or cause backend errors) before a permission check is enforced; move the partition lookup after the security check so that only authorized users can hit the ODPS backend for metadata. [security]
Severity Level: Major ⚠️
- ❌ Unauthorized users can trigger ODPS table metadata lookups.
- ⚠️ Table existence may leak via ODPS error behavior or timing.
- ⚠️ Extra ODPS calls for denied requests increase backend load.| is_partitioned_table, partition_fields = DatabaseDAO.is_odps_partitioned_table( | |
| database, table_name | |
| ) | |
| try: | |
| security_manager.raise_for_access(database=database, table=table) | |
| except SupersetSecurityException as ex: | |
| # instead of raising 403, raise 404 to hide table existence | |
| raise TableNotFoundException("No such table") from ex | |
| try: | |
| security_manager.raise_for_access(database=database, table=table) | |
| except SupersetSecurityException as ex: | |
| # instead of raising 403, raise 404 to hide table existence | |
| raise TableNotFoundException("No such table") from ex | |
| is_partitioned_table, partition_fields = DatabaseDAO.is_odps_partitioned_table( | |
| database, table_name | |
| ) |
Steps of Reproduction ✅
1. Configure a Superset Database record pointing to an ODPS (MaxCompute) instance so that
`Database.backend == "odps"` (used in `DatabaseDAO.is_odps_partitioned_table` at
`superset/daos/database.py:245-257`).
2. Start Superset and authenticate as a user that has API access to the Database REST API
but lacks table-level access for a specific ODPS table (table access is enforced via
`security_manager.raise_for_access(database=database, table=table)` at
`superset/databases/api.py:1087-1091` and `SupersetSecurityManager.can_access_table` at
`superset/security/manager.py:906-920`).
3. Call the table metadata endpoint implemented by `DatabaseRestApi.table_metadata` in
`superset/databases/api.py:1023-1100` with `GET
/<int:pk>/table_metadata/?name=<table_name>&schema=<schema>&catalog=<catalog>` where `pk`
is the ODPS database id and `<table_name>` is any table name (including ones the user is
not allowed to access).
4. Observe that before the permission check runs, `table_metadata` invokes
`DatabaseDAO.is_odps_partitioned_table(database, table_name)` at
`superset/databases/api.py:1084-1085`, which in turn creates an ODPS client and calls
`odps_client.get_table(table_name)` at `superset/daos/database.py:278-280`, hitting the
external ODPS API with Superset's credentials; only after this remote call does
`security_manager.raise_for_access` run and potentially raise `TableNotFoundException` for
unauthorized users, meaning unauthorized callers can still trigger ODPS metadata lookups
and, if `get_table` raises (e.g. non-existent table), cause backend errors before access
is denied.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/databases/api.py
**Line:** 1084:1091
**Comment:**
*Security: The ODPS partition detection is performed before checking the user's table access, meaning an unauthorized caller can still trigger ODPS API calls (and potentially learn about table existence or cause backend errors) before a permission check is enforced; move the partition lookup after the security check so that only authorized users can hit the ODPS backend for metadata.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Code Review Agent Run #60811b
Actionable Suggestions - 1
-
superset/sql/parse.py - 1
- Incorrect equality implementation · Line 351-352
Additional Suggestions - 3
-
superset/daos/database.py - 2
-
Incorrect ODPS URI regex pattern · Line 265-267The regex pattern extracts a password from the URI but uses database.password instead, which is inconsistent and may cause parsing failures. It also contains a syntax error with '\\?' instead of '\?'. Based on the code's use of database.password, the URI likely does not include the password.
Code suggestion
@@ -265,4 +265,3 @@ - pattern = re.compile( - r"odps://(?P<username>[^:]+):(?P<password>[^@]+)@(?P<project>[^/]+)/(?:\\?" - r"endpoint=(?P<endpoint>[^&]+))" - ) + pattern = re.compile( + r"odps://(?P<username>[^@]+)@(?P<project>[^/]+)/(?:\?endpoint=(?P<endpoint>[^&]+))" + )
-
Redundant docstring wording · Line 248-252The docstring says 'whether the partition table is partitioned', which is redundant. It should simply say 'whether the table is partitioned'.
Code suggestion
@@ -250,2 +250,2 @@ - The return values are whether the partition - table is partitioned and the names of all partition fields. + The return values are whether the + table is partitioned and the names of all partition fields.
-
-
superset/sql/parse.py - 1
-
Docstring typo and inconsistency · Line 326-331The docstring has a typo and uses an inconsistent attribute name in the example, which could confuse readers.
Code suggestion
@@ -327,3 +327,3 @@ - Partition object, with two attribute keys: - ispartitioned_table and partition_comlumn, - used to provide partition information + Partition object, with two attribute keys: + is_partitioned_table and partition_column, + used to provide partition information @@ -331,1 +331,1 @@ - {"ispartitioned_table":true,"partition_column":["month","day"]} + {"is_partitioned_table":true,"partition_column":["month","day"]}
-
Review Details
-
Files reviewed - 7 · Commit Range:
3e0f47a..aabc26d- superset/daos/database.py
- superset/databases/api.py
- superset/db_engine_specs/__init__.py
- superset/db_engine_specs/odps.py
- superset/sql/parse.py
- tests/integration_tests/db_engine_specs/base_engine_spec_tests.py
- tests/unit_tests/sql/parse_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| def __eq__(self, other: Any) -> bool: | ||
| return str(self) == str(other) |
There was a problem hiding this comment.
The custom eq compares string representations, which is incorrect and violates hash consistency for the frozen dataclass.
Code suggestion
Check the AI-generated fix before applying
| def __eq__(self, other: Any) -> bool: | |
| return str(self) == str(other) |
Code Review Run #60811b
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Summary
When using the ODPS (MaxCompute) data source, previewing partitioned tables in SQLLab would fail because ODPS requires a partition to be specified for partition tables.
This PR adds ODPS-specific handling:
OdpsEngineSpecwith partition detection supportselect_starto add partition filter for ODPS partition tablesPartitiondataclass insql/parse.pyis_odps_partitioned_tablemethod inDatabaseDAOThis is an adoption of #32556 by @zhutong6688, rebased on current master with:
allow_limit_clauseattribute)list[str] | Noneinstead ofOptional[List[str]])Closes #32301
Supersedes #32556
Test plan
🤖 Generated with Claude Code
Co-Authored-By: zhutong6688 zhutong66@163.com