Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving plotly figures made inside pipeline and reload in Kedro Viz. #839

Closed
lucasjamar opened this issue Jul 19, 2021 · 3 comments
Closed
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@lucasjamar
Copy link
Contributor

Description

Many plotly figures are too complex to be defined inside of the data catalog (ex: mixing lines and bar charts, double y axis, faceted plots). It would be great to be able to build the plots inside of a node and then pass the plotly figure to the data catalog to be saved as a json and reloaded in the Kedro Viz.

Possible Implementation

  1. Use the current implementation, if a pd.DataFrame is passed, build the plotly figure based on the save args and save it to json. If a plotly.goFigure is passed, save it to straight to json.
  2. Split the datasets into two datasets? Rename the current implementation as plotly.PandasDataset and create a plotly.JSONDataSet for saving plotly figures as json.

Please let me know your thoughts!

@lucasjamar lucasjamar added the Issue: Feature Request New feature or improvement to existing feature label Jul 19, 2021
@limdauto
Copy link
Contributor

Hi @lucasjamar thanks for the feedback. I think option 2. is a really good idea. We were considering it for the first release but in the end went with the current implementation as it was already being used internally by one of our teams. Having a pure JSONDataSet for plotly would be the good natural next step.

austospumanto pushed a commit to austospumanto/kedro that referenced this issue Aug 24, 2021
@antonymilne
Copy link
Contributor

antonymilne commented Aug 31, 2021

Thanks for writing this @lucasjamar - I agree with what you say, and in my (personal, not kedro team) opinion our current implementation of the plotly dataset isn't quite right. Here's what we have currently:
image

This is confusing because the save/load operations aren't symmetric, since the process of going from dataframe to figure is obviously irreversible. We should consider the two different ways of generating a plotly plot:

  1. Simple standard plots on a pd.DataFrame that can be defined inside the data catalog
  2. More complex or general plots that are written inside a node. These would most likely be based on pd.DataFrame but shouldn't have to be: I should be able to make a go.Figure object any way I like and then save it so that it works with kedro viz

As you say, the current implementation doesn't cater for 2.

Here's my proposed solution:
image

The pandas.PlotlyWriter (name inspired by matplotlib.MatplotlibWriter) enables method 1; plotly.JSONDataSet enables method 2. Method 2 is a general, conventional kedro sort of dataset; method 1 is really just a convenience wrapper to enable you to do plots from the data catalog.

The only issue I see here is that kedro viz would need to load up pandas.PlotlyWriter using something other than the dataset's load method if that's not defined. I would expect the same issue to be true for matplotlib.MatplotlibWriter though (hopefully we should be able to load up pngs on kedro viz, even if you can't load them in a pipeline?).

Also note that if you do want to load up a pandas.PlotlyWriter dataset as part of your pipeline (e.g. to write most of your plot definition in the catalog but then tweak it in a node) you would be able to do so by transcoding and loading it as a plotly.JSONDataSet. This sounds cleaner to me than defining pandas.PlotlyWrite.load in an asymmetric way.

@antonymilne
Copy link
Contributor

antonymilne commented Oct 22, 2021

In #981 we've implemented the new plotly.JSONDataSet as suggested by @lucasjamar (thank you for raising the issue). This will be part of 0.17.6. In the future we might return to renaming plotly.PlotlyDataSet and/or removing its load method but these would both be breaking changes.

Also, possibly in future we should be moving plotly_args to a subkey of save_args rather than a top-level key? Not sure...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

3 participants