-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kedro-Viz to show preview of data #907
Comments
Would love this! One note on implementation - we need a workflow to avoid opening enormous files for no reason.
|
@datajoely I think we should add an optional |
Yeah agreed |
I like this idea and have thought about similar schemes in the past. So since you've brought it up here, let me dump some thoughts I had before here also... Two basic questions:
Just using plotly for pandas and/or spark dataframes would be totally great for an MVP and to get user feedback, but I just want to brainstorm how we might want to make this more generic in the longer term. The question of adding custom properties to datasets comes up quite a bit, e.g. #662 (put number of rows in dataset on kedro-viz), https://github.com/quantumblacklabs/private-kedro/issues/1148 (add metadata to catalog entries than can be consumed by plugins), kedro-org/kedro#1076 (very long-standing issue on how to add metadata to catalog entries). This is not just limited to kedro-viz but there's a more general kedro question of how to attach metadata to a catalog entry. Let me just focus on the kedro-viz question here though. #662 (comment) spells out my rough idea for this: user-customisable dataset widgets. This is quite similar to the idea of kedro-viz extensions, only:
According to this scheme, previewing the first 5 rows of a dataset would be some kind of Is the idea of a marketplace of custom widgets for kedro-viz datasets a huge overkill for this? At the moment, absolutely yes. We could achieve what @rashidakanchwala's describes much more simply. And at the moment I think kedro-viz extensions would be better to work on than dataset widgets. But I think it's worth thinking about where this might end up in future though, since it might spark other people's ideas and potentially affects design decisions up front. e.g.
This seems too ad-hoc and hacky to me, like the current implementation of |
Notes from Technical Design session: The team discussed a possible solution to preview data in Viz both on the metadata panel and the experiment tracking panel. Some questions raised around the goal of showing a preview:
The consensus is that just a blanket preview of showing the first 5-10 rows wouldn't be useful with all data, and thus the preview should be customisable. Possible solution: A downside of this solution is that we would essentially be adding visualisation specific code to the framework side, blurring the boundaries between Kedro Viz and Kedro Framework. But the Follow up questions/actions:
|
A few more thoughts on the
The simplest way to implement this would be for the user to write two new sorts of dataset, something like this: class CSVDataSetWithNumberOfRows(pandas.CSVDataSet):
def preview():
return len(self._load())
class CSVDataSetWithHead(pandas.CSVDataSet):
def preview():
return self._load().head() Then in the catalog file you need to change the relevant dataset This seems quite unsatisfactory:
Fundamentally I think the problem here is that datasets are not easily composed. I cannot easily "mix in" a new behaviour without creating a whole new class. @limdauto mentioned once that Dmitrii had prototyped some new component-based dataset architecture that looks more like my widgets example above. This might be a major change to how kedro datasets work though, which I don't think is on the cards for the foreseeable future. In reality, is this a problem? Possibly not; maybe we just hard code a sensible default Problem is, I'm not sure I have a better alternative... Maybe hooks + a viz.yml config file somehow? Certainly this would keep the functionality on the kedro-viz side much more. Let me ponder this and write it up as an alternative proposal. |
I think |
Hi team, I was thinking maybe the _preview method can be in Viz as it is a viz implementation. And within the Kedro project catalog.yml we define it like below so the Viz knows how/what to handle for different datasets? feature_engineering_output: @MerelTheisenQB , @datajoely , @tynandebold , @idanov |
What about adding preview logic to the pandas -> |
Notes from Technical Design session:
A question: what icon would we have for a node with a data preview inside it?
|
Closing this ticket as design and implementation work for the feature is mentioned on ticket #1136 |
Update - I had a discussion with @merelcht , the preview function will be written on Kedro side. We are unsure if it's only preview, or also we share the metadata information about (number of rows/columns etc) I am reponening this ticket as front-end design is done but there's still on going discussions around implementation |
This work will touch Kedro datasets as well as the backend and frontend of Viz. The first dataset we should add a preview method to is For the frontend work, the design was done in #1136, so check there for reference. |
Description
Kedro-viz supports Plotly.
Plotly has cool tables -https://plotly.com/python/table/
the idea is simply show the first 5/10 rows of the dataset on Kedro-viz
Implementation
Since we already support Plotly, this would be easy to do, we just read the first 5 rows from the data and display it as a table.
There is an argument around loading so many datasets might make kedro-viz slow. But loading only happens when metadata panel is clicked which is one dataset at a time. Also maybe on Kedro we can allow users to specify which datasets they want to preview on Kedro-viz using catalog.yml preview = true
The text was updated successfully, but these errors were encountered: