Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Filters): Apply native & cross filters on common columns #30438

Merged
merged 21 commits into from
Oct 15, 2024

Conversation

geido
Copy link
Member

@geido geido commented Sep 30, 2024

SUMMARY

When in scope (global or specific) charts that are not applicable to the filter (do not share the same datasource or columns) were being applied anyway, causing unnecessary queries. This PR checks that the charts have columns in common or will exclude the charts.

AFTER

COVID.Vaccine.Dashboard.mp4
COVID.Vaccine.Dashboard.1.mp4

TESTING INSTRUCTIONS

  1. Create a native filter that has no common columns with a chart
  2. Add the chart in scope
  3. Apply the filter
  4. The chart should not reload and emit a query

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@dosubot dosubot bot added dashboard:cross-filters Related to the Dashboard cross filters dashboard:native-filters Related to the native filters of the Dashboard labels Sep 30, 2024
@michael-s-molina michael-s-molina linked an issue Sep 30, 2024 that may be closed by this pull request
3 tasks
@kgabryje
Copy link
Member

/testenv up

Copy link
Contributor

@kgabryje Ephemeral environment spinning up at http://35.90.248.247:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@geido
Copy link
Member Author

geido commented Oct 1, 2024

Just realized that highlighting a filter that is not applicable should also be added as an enhancement for this PR. Working on that.

@geido geido marked this pull request as draft October 1, 2024 18:07
@geido geido marked this pull request as ready for review October 4, 2024 14:09
@sadpandajoe sadpandajoe added the v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch label Oct 4, 2024
@sadpandajoe
Copy link
Member

/testenv up

Copy link
Contributor

@sadpandajoe Ephemeral environment spinning up at http://35.88.232.227:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

@sadpandajoe
Copy link
Member

Just a quick question, should it affect all charts if it's from the same database table even if one is virtual? Example here: http://35.88.232.227:8080/superset/dashboard/11/?native_filters_key=vORa9YtGn5rLPgi2AjgirnIJB_30m_cpUE3kzlqWFgMGjPIw3CKoFrTKjNNAUkan

  1. go to sql lab
  2. run the query
SELECT * 
FROM covid_vaccines 
WHERE country_name = "United States"
  1. save as a new dataset
  2. copy one of the charts that use covid_vaccines
  3. switch the dataset to the copied one
  4. put this on the covid vaccines dashboard
  5. do a cross filter on the newly created dataset chart

right now following this step, all charts are affected by the virtual dataset.

Copy link
Member

@sadpandajoe sadpandajoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with @yousoph and it sounds like my case is invalid so going to approve.

@sadpandajoe sadpandajoe merged commit 3629483 into master Oct 15, 2024
35 of 36 checks passed
@sadpandajoe sadpandajoe deleted the geido/fix/filters-applicability branch October 15, 2024 23:13
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

sadpandajoe pushed a commit that referenced this pull request Oct 15, 2024
@Mainer-g00t
Copy link

All our dashboards utilize cross-filters across various virtual datasets, and we use Jinja to capture and apply filters, even if the filtered columns in the filter are not present in the dataset.
For example:

  • dataset & chart 1: city, number of habitants
  • dataset & chart 2: profession, number of proffesionals
    We filter between both using the filter in the WHERE statement. The reason for having different datasets is that the cardinality in our tables is very high and a single dataset ruins performance.

I'm afraid that if this is applied automatically to every case could basically break all of our dashboards

@fmannhardt
Copy link
Contributor

fmannhardt commented Oct 20, 2024

Not sure if this should go here or to a new issue (and I won't manage to find time to create a reproducible example in the next week). However, I re-tested this feature together with cross filters and the following setup:

  • A renamed column (colum_a AS "Colum A") is used as source for the cross filter from chart A (with data source A)
  • A second chart B from a different data source B has the same underlying column (column_a) but this column is only used in a filter of that chart that is based on an JINJA expression (using get_filters() as explained here: https://superset.apache.org/docs/configuration/sql-templating/)
  • Result is that the cross filter on chart A does not affect chart B.

I looked through the code and it, in fact, does not consider searching for JINJA in the filter of other charts:

function checkForExpression(formData?: Record<string, any>) {
  const groupby = ensureIsArray(formData?.groupby) ?? [];
  const allColumns = ensureIsArray(formData?.all_columns) ?? [];
  const checkColumns = groupby.concat(allColumns);
  return checkColumns.some(
    (col: string | Record<string, any>) =>
      typeof col !== 'string' && col.expressionType !== undefined,
  );
}

Shouldn't this also look through the adhoc_filters?

There is a workaround to define a calculated column which matches the label of the emitter column. So, adding a calculated column "Colum A", which is simply using colum_a as SQL expression, to dataset B helps. But the whole approach to cross filter based on labels seems very brittle.

@villebro
Copy link
Member

I didn't review these changes, but it was my understanding that

  1. by default, native and cross filters will only be scoped to charts using the same dataset
  2. If one wants the filters to apply to charts that aren't using the same dataset, one simply adds them to the filter scope

Similar to the comments above, I know of many use cases where there's jinja logic that intercepts filters for columns that aren't present in the dataset, and use those to modify the chart query somehow. I'm not sure if this has been explicitly stated as a feature that Superset supports, but I think this use case has been well established for a longer time already, hence I feel removing it will be problematic.

Does this PR break said workflow? I'm happy to jump on a sync call to discuss this if needed.

@michael-s-molina
Copy link
Member

  1. by default, native and cross filters will only be scoped to charts using the same dataset

@geido was really careful to not break the compatibility. For that reason, columns with the same name are also affected even if they belong to different datasets. This was some legacy behavior that @geido rightfully maintained.

  1. If one wants the filters to apply to charts that aren't using the same dataset, one simply adds them to the filter scope

Looking at the video below, where each chart is from a different dataset, we can see that if I modify the scope, the chart is reloaded according to the scope. I noticed that the query is not being modified when reloading the chart but I don't know if this was intentional or if we found a bug. @geido

Screen.Recording.2024-10-21.at.16.16.14.mov

@villebro
Copy link
Member

@Mainer-g00t @fmannhardt as you raised concerns here, did you find that this PR broke your use cases?

@Mainer-g00t
Copy link

@Mainer-g00t @fmannhardt as you raised concerns here, did you find that this PR broke your use cases?

Yes, it does. In our case, charts export filters that are picked up by jinja logic to filter dataset with data that is not part of the columns in the dataset. Specifically, our dataset looks like:

  • customer (1k), products (1k), providers(1k), metrics...

This leads to a roughly 500M row table. We split it in 3 datasets, one for each dimension and use x-filters to select on the other datasets. ie, clicking on one customer on the first table will filter all products and providers for that customer and so on.

This allows us to display relationships in big datasets breaking it up in dimensions and using x-filters

@villebro
Copy link
Member

@Mainer-g00t @fmannhardt as you raised concerns here, did you find that this PR broke your use cases?

Yes, it does. In our case, charts export filters that are picked up by jinja logic to filter dataset with data that is not part of the columns in the dataset. Specifically, our dataset looks like:

  • customer (1k), products (1k), providers(1k), metrics...

This leads to a roughly 500M row table. We split it in 3 datasets, one for each dimension and use x-filters to select on the other datasets. ie, clicking on one customer on the first table will filter all products and providers for that customer and so on.

This allows us to display relationships in big datasets breaking it up in dimensions and using x-filters

Just to confirm, have you tested the changes here and can confirm that they do indeed break your use case? Because to me it seems like you should still be able to do the same cross filtering even after this change, as long as you've defined your filter scopes correctly.

@Mainer-g00t
Copy link

@Mainer-g00t @fmannhardt as you raised concerns here, did you find that this PR broke your use cases?

Yes, it does. In our case, charts export filters that are picked up by jinja logic to filter dataset with data that is not part of the columns in the dataset. Specifically, our dataset looks like:

  • customer (1k), products (1k), providers(1k), metrics...

This leads to a roughly 500M row table. We split it in 3 datasets, one for each dimension and use x-filters to select on the other datasets. ie, clicking on one customer on the first table will filter all products and providers for that customer and so on.
This allows us to display relationships in big datasets breaking it up in dimensions and using x-filters

Just to confirm, have you tested the changes here and can confirm that they do indeed break your use case? Because to me it seems like you should still be able to do the same cross filtering even after this change, as long as you've defined your filter scopes correctly.

Yes, we just tested it on 4.1 and breaks cross filtering for us. It does not apply filters if the column is not in the dataset even when it is used in the WHERE statement with jinja

@geido
Copy link
Member Author

geido commented Oct 22, 2024

Thanks @Mainer-g00t and @fmannhardt. Let me have a deeper look at what you are mentioning.

@fmannhardt
Copy link
Contributor

fmannhardt commented Oct 22, 2024 via email

@michael-s-molina
Copy link
Member

Hi @fmannhardt @Mainer-g00t @geido. May I suggest opening a specific GitHub issue for this so we can track it here?

@Mainer-g00t
Copy link

Hi @fmannhardt @Mainer-g00t @geido. May I suggest opening a specific GitHub issue for this so we can track it here?

Hello, sure, I'll try to do it today with an example of the behaviour before and after the change

@fbgeobit
Copy link

Hi @michael-s-molina ,
We opened a new issue with use case commented by @Mainer-g00t : #30687

Thanks

@michael-s-molina
Copy link
Member

Thank you! I added the issue to the 4.1 board.

@github-actions github-actions bot added 🍒 4.1.0 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels dashboard:cross-filters Related to the Dashboard cross filters dashboard:native-filters Related to the native filters of the Dashboard doc Namespace | Anything related to documentation packages size/XL v4.1 Label added by the release manager to track PRs to be included in the 4.1 branch 🍒 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cross-filtering is reloading unaffected charts
9 participants