-
Notifications
You must be signed in to change notification settings - Fork 13.9k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP] Propose visualizations based on data #12724
Comments
Thanks for suggesting! @wernerdaehn
It is aligned with our long term product roadmap. in fact, when we implemented new time picker in Superset, we thought about allowing user to query the earliest(min) and latest(max) time available in the timestamp dimension. couldn't get to it by v1.0 because of potential performance issues and our time constraints. collecting more metadata of dataset is something we wanna do once we get to refactoring the major control fields like metrics, filter etc.
something we will consider. it probably will require us to 'thickening' our semantic layer in Superset and steepen the learning curve of Superset.
both are features available in Tableau. I agree they provides nice user experience and enables non tech users to create visualization intuitively. we would love to get to both someday. Screen.Recording.2021-01-25.at.1.26.05.AM.mov |
@wernerdaehn if you would like contribute any above items to Superset in any ways, we would love to work with you! |
@junlincc Thanks for the feedback. Just for the records, what Tableau does is just the very beginning! |
Any suggestion of what I can do for you in that regards? Else I will try to come up with something to discuss but would love to get your guidance. |
Thanks for bringing up this topic! This definitely is an interesting area of work and has a lot of potential for Superset. What you described is often called automated chart specification, or automated Exploratory Data Analysis (EDA), which is also quite big among DataViZ academics: https://github.com/mstaniak/autoEDA-resources It would be tremendously valuable if we could somehow integrate the latest research findings to an open source/commercial BI software. This SIP is a good starting point, which seems to have identified a couple of items we can already do. I’d recommend keep researching on this topic and start digging into the Superset codebase/architecture to form a more concrete action plan. We should at least be able to answer:
Some other useful links: |
I just wanted to chime in and say that I love this idea, and it's something that my team is starting to more seriously investigate. @wernerdaehn would you be interested in joining discussions (synchronously or otherwise) around this and being a part of implementing the solution? If not, I think we may need more clarification on how the approaches to implementation and any risks/dependencies involved, as @ktmud was suggesting. In other words, I think this is a great idea for a SIP, but we need more details to be able to put it to a vote and carry it out effectively. |
@rusackas By all means, Evan! More than happy to contribute. As a preliminary start, here is my thinking: According to explanatory statistics there are four types of scales, ordered by capabilities:
If somebody wants to visualize a nominal value and a ratio value, e.g. Revenue per Color, a Bar chart is one of the few that makes sense. For two ratio values, e.g. revenue per customer-age a scatter plot is suited. The next type of decision is the number of axis.
The type of axis can further be refined:
One side effect of these types is how to render missing values. A country without revenue should still be present (geomap) or not (bar chart). A month without revenue should still be shown, you do not want to see just 11 months. The number of distinct values of nominal and ordinal scales is an important decision point as well. A Pie chart with 5000 categories might not be the best suited chart type. Showing above revenue per country over time could be shown as line chart with one line per country. Excellent for comparisons between countries unless you have 100 countries and 100 lines hence. The final decision type is the purpose of the visualization:
The nice thing is that we can start small and grow the solution. Initially we just categorize each column of the result set into the scale type and each chart has the information which scale type it allows for what axis. That by itself would reduce the list of charts to offer by a lot. And from that we can grow and grow with the available metadata on the data and the chart info. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
[SIP] Propose visualizations based on data
Motivation
I have been working for Business Objects and SAP and I am in the Business Intelligence Market for more than 20 years. One thing that is still not satisfying is how the charting options are chosen.
Over the time the number of available charts and their variants will increase more and more and selecting from the long list is cumbersome. Also not everybody knows all visualization options for every case.
But given that superset has a semantic layer, you can preselect the visualizations.
Example: 2 Attributes & 2 Measures? Very likely a Pie Chart will not be the proper visualization.
There is an entire academic theory about different axis types (Nominalscale, Ordinalscale, Intervalscale, Ratioscale) for example. In case you are interested we can work on the details.
Proposed Change
Please let me know if you are interested and I would spend some time to work out the details.
The text was updated successfully, but these errors were encountered: