-
Notifications
You must be signed in to change notification settings - Fork 16.6k
fix(results): handle unnamed result columns safely . #37570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(results): handle unnamed result columns safely . #37570
Conversation
Code Review Agent Run #b2c24aActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| if value is None: | ||
| return f"_col{index}" | ||
|
|
||
| name = convert_to_string(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Decoding column names can raise UnicodeDecodeError when convert_to_string tries to decode non-UTF-8 bytes; catch decoding errors and fallback to a safe decode (errors='replace') or str conversion so the normalization step never raises. [possible bug]
Severity Level: Major ⚠️
- ❌ SupersetResultSet fails on non-UTF8 column names.
- ⚠️ Query display can error for weird driver encodings.
- ⚠️ Downstream callers may see exceptions during result parsing.| name = convert_to_string(value) | |
| try: | |
| name = convert_to_string(value) | |
| except UnicodeDecodeError: | |
| # Fallback for bytes that are not valid UTF-8: replace invalid bytes | |
| if isinstance(value, (bytes, bytearray)): | |
| name = value.decode("utf-8", errors="replace") | |
| else: | |
| name = str(value) |
Steps of Reproduction ✅
1. Open `superset/result_set.py` and find `convert_to_string` at
`superset/result_set.py:89-99` and `normalize_column_name` at
`superset/result_set.py:102-124`.
2. Create a cursor description entry containing non-UTF8 bytes, e.g. `cursor_description =
[(b"\xff", None, None, None, None, None, None)]`.
3. Instantiate `SupersetResultSet` in a test or REPL:
`SupersetResultSet(data=[], cursor_description=cursor_description,
db_engine_spec=BaseEngineSpec)` (`__init__` begins at `superset/result_set.py:128`).
4. During name normalization the code calls `convert_to_string` (invoked from
`normalize_column_name` at `superset/result_set.py:118`), which executes the bytes decode
at `superset/result_set.py:96-97` and raises `UnicodeDecodeError` for invalid UTF-8 bytes,
causing `SupersetResultSet.__init__` to error and abort result construction.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/result_set.py
**Line:** 118:118
**Comment:**
*Possible Bug: Decoding column names can raise UnicodeDecodeError when `convert_to_string` tries to decode non-UTF-8 bytes; catch decoding errors and fallback to a safe decode (errors='replace') or str conversion so the normalization step never raises.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| column_names = dedup( | ||
| [convert_to_string(col[0]) for col in cursor_description] | ||
| [ | ||
| normalize_column_name(col[0], idx) | ||
| for idx, col in enumerate(cursor_description) | ||
| ] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Accessing col[0] without validating that col is a sequence (or non-None) can raise TypeError/IndexError if a cursor description entry is None or not indexable; iterate and safely extract the 0th element before calling normalize_column_name. [possible bug]
Severity Level: Major ⚠️
- ❌ SupersetResultSet construction fails on malformed cursor descriptions.
- ⚠️ Query rendering flows may error when cursor.description malformed.
- ⚠️ Alerts/dimension creation pipelines using result set fail.| column_names = dedup( | |
| [convert_to_string(col[0]) for col in cursor_description] | |
| [ | |
| normalize_column_name(col[0], idx) | |
| for idx, col in enumerate(cursor_description) | |
| ] | |
| ) | |
| column_names_values: list[str] = [] | |
| for idx, col in enumerate(cursor_description): | |
| try: | |
| name_value = col[0] | |
| except Exception: | |
| # description entry missing or not indexable; treat as unnamed | |
| name_value = None | |
| column_names_values.append(normalize_column_name(name_value, idx)) | |
| column_names = dedup(column_names_values) |
Steps of Reproduction ✅
1. In a Python REPL or unit test import the class: open `superset/result_set.py` and
locate `SupersetResultSet.__init__` at `superset/result_set.py:128` (constructor start).
2. Construct a malformed cursor description and invoke the constructor directly:
- Create `cursor_description = [None]` or `cursor_description = [123]` in the test.
- Call `SupersetResultSet(data=[], cursor_description=cursor_description,
db_engine_spec=BaseEngineSpec)` (the call site is the `__init__` at
`superset/result_set.py:128`).
3. When the list comprehension at `superset/result_set.py:146-150` executes, it tries to
evaluate `col[0]` (the call to `normalize_column_name(col[0], idx)` is at
`superset/result_set.py:148`), which raises:
- `TypeError: 'NoneType' object is not subscriptable` for `None`, or
- `TypeError: 'int' object is not subscriptable` for `123`.
4. Observe the exception bubbles out of `SupersetResultSet.__init__` and prevents
result-set construction; this reproduces the failure caused by unguarded `col[0]` access.
- Note: This is reproducible without other parts of Superset by directly calling the
constructor described above.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/result_set.py
**Line:** 146:151
**Comment:**
*Possible Bug: Accessing `col[0]` without validating that `col` is a sequence (or non-None) can raise TypeError/IndexError if a cursor description entry is None or not indexable; iterate and safely extract the 0th element before calling `normalize_column_name`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
SUMMARY
Fixes failures caused by SQL queries that return unnamed result columns
(e.g.
SELECT COUNT(*) FROM my_table), which previously caused errors inSQL Lab, alerts, and dimension creation.
Superset assumed all result columns had explicit names. Some databases
(including MSSQL) return unnamed columns for valid SQL expressions.
This change normalizes unnamed result columns at the shared result-set
handling layer by assigning stable, deterministic names, while preserving
explicitly named columns unchanged.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Not applicable.
TESTING INSTRUCTIONS
Automated testing:
AS total) are preserved unchangedManual verification (optional):
SELECT COUNT(*) FROM my_tablein SQL LabADDITIONAL INFORMATION