Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(time grain): time grain clickhouse db error #26735

Conversation

hxxdraised
Copy link

@hxxdraised hxxdraised commented Jan 22, 2024

SUMMARY

DB error on chart load when time granularity is set when the source is Clickhouse. Error example: Error: Orig exception: Code: 215. DB::Exception: Column `Column2` is not under aggregate function and not in GROUP BY: While processing toStartOfDay(toDateTime(Column2)) AS Column2, count() AS count.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

before:
image
after:
image

TESTING INSTRUCTIONS

  1. Create ClickHouse database
  2. Make table with temporal column
  3. Install the Clickhouse driver via adding clickhouse-sqlalchemy==0.2.5 line in docker/requirements-local.txt
  4. Connect ClickHouse DB to superset
  5. Create line chart from table datasource and select temporal column in X-Axis option

ADDITIONAL INFORMATION

  • Has associated issue: Fixes #23384
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link

codecov bot commented Jan 23, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (6443001) 69.07% compared to head (b4da5ce) 69.13%.
Report is 51 commits behind head on master.

Files Patch % Lines
superset/models/helpers.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26735      +/-   ##
==========================================
+ Coverage   69.07%   69.13%   +0.06%     
==========================================
  Files        1930     1930              
  Lines       75279    75010     -269     
  Branches     8429     8429              
==========================================
- Hits        51999    51859     -140     
+ Misses      21133    21004     -129     
  Partials     2147     2147              
Flag Coverage Δ
hive 53.85% <75.00%> (+0.28%) ⬆️
mysql 78.02% <75.00%> (+0.19%) ⬆️
postgres 78.12% <75.00%> (+0.19%) ⬆️
presto 53.80% <75.00%> (+0.27%) ⬆️
python 83.01% <75.00%> (+0.23%) ⬆️
sqlite 77.71% <75.00%> (+0.19%) ⬆️
unit 56.38% <75.00%> (+0.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@geido geido requested review from zhaoyongjie and villebro January 23, 2024 18:40
@zhaoyongjie
Copy link
Member

zhaoyongjie commented Jan 23, 2024

@hxxdraised please post the original SQL and fixed SQL in the PR description for the reviewing. I guess that this issue is an known issue in CH SQL parser side, but there is a workaround to fix it in Superset side.

To create a verbose name in the Dataset level in Superset, and then the SQL snippet will be toStartOfDay(toDateTime(Column2)) AS THE_NEW_VERBOSE_NAME

image

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hxxdraised for the PR. Your change augments some existing shared logic in the superset/models/helpers.py which I'm worried could break existing logic.

Would you mind adding some unit tests? This would help reviewers provide the necessary level of confidence that said change is valid.

@@ -397,6 +397,9 @@ class BaseEngineSpec: # pylint: disable=too-many-public-methods
# Can the catalog be changed on a per-query basis?
supports_dynamic_catalog = False

# Use column alias instead of aggregate function in GROUP BY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Use column alias instead of aggregate function in GROUP BY
# Use column alias instead of aggregate function in GROUP BY

@@ -397,6 +397,9 @@ class BaseEngineSpec: # pylint: disable=too-many-public-methods
# Can the catalog be changed on a per-query basis?
supports_dynamic_catalog = False

# Use column alias instead of aggregate function in GROUP BY
use_column_alias_in_groupby = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the SQLAlchemy dialect doesn't handle this. The visit_label method is likely relevant. Ideally we adopt the "shift left" mentality and have the underlying dialect (rather than Superset) handle this logic.

@TechAuditBI
Copy link
Contributor

@john-bodley @betodealmeida Should not we somehow connect this PR with SIPs 115 and 117?

@john-bodley
Copy link
Member

@TechAuditBI this change seems unrelated to SIP-115 and SIP-117. Personally—if we adopt the "shift left" mentatility—th necessary change should be made in the clickhouse-connect SQLAlchemy dialect.

@rusackas
Copy link
Member

rusackas commented Apr 5, 2024

Hi all! Just wondering if there's anyone sees a way to move forward on this, and close out the issue (which is getting pretty stale).

I'm also wondering if this is even still an issue, since clickhouse-connect was 0.5.3 at the time of reporting and is now on 0.7.7 - maybe we'll get lucky :D

@round3d
Copy link

round3d commented Jul 9, 2024

Hi @rusackas ,

I use clickhouse (v 5.7.30) with superset master and this isn't a problem for me with clickhouse-connect 0.7.16
The generated SQL works correctly and the group by is not the alias.

image

@rusackas
Copy link
Member

rusackas commented Jul 9, 2024

Thanks @round3d ! I'll go ahead and close this PR and the related issue. If anyone has issues and sees the need to reopen this, say the word!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants