-
Notifications
You must be signed in to change notification settings - Fork 407
plots: document images support #2839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Data series plots utilize [Vega-Lite](https://vega.github.io/vega-lite/) for | ||
| rendering. Vega-Lite is a declarative grammar for defining plots using JSON. The | ||
| plots can also be saved as SVG or PNG image filed from the browser. Image type | ||
| plots are rendered using `<img>` tag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity - how do we deal with an image file from a past commit? when I do dvc plots diff? do we embed it into HTML, copy it nearby?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we just insert files with identical path into same <div> tag.:
During treeverse/dvc#6603 I will try to make it look less terrible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, but where does it take the image itself from HEAD? the png file - how does it read from the Git history?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We read it as a binary file and later write the same way into the webpage.
It can become problematic (memory-wise) in case of multiple images and revision, so further development assumes
using cache URL instead of binary data (in cached plots use case). In the case of git, well this approach will probably have to remain.
related:
treeverse/dvc#6145 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein Not sure if you meant how images are collected or how they are sourced into the HTML. If you meant the latter, dvc plots diff now generates a directory that looks like:
dvc_plots
├── index.html
└── static
├── HEAD_image.png
└── workspace_image.png
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dberenbaum ! I'm not sure if then We read it as a binary file by Pawel is still relevant?
I was asking this because we'll need access to these data from the VS Code env. We are iterating right now on the requirements and I'm curious what kind of interface we can expect/build for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment from @pared is relevant for Studio, but probably not for VSCode. We don't have any --show-json or other machine-readable output from dvc plots diff today for images, so either VSCode can read directly from the generated dvc_plots directory, or we will need to add some other option to provide the needed info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine for us to start with dvc plots diff if it supports experiments/checkpoints. But it doesn't sound like a very efficient thing. Ideally yes, we'll need an interface to see the list of available things across commits/experiments and either path to those or a command that would stream them for us (if it's a Git history) and/or find another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @rogermparent ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it doesn't sound like a very efficient thing.
Do you mean by compute, memory, or storage? @pared linked to the discussion about memory concerns above, where we can make optimizations if needed. We chose to make copies of the files in the dvc_plots directory so that the results are self-contained and can be shared, moved, etc. This seems like a case where the user requirements and downstream product requirements might not align well.
|
Related: treeverse/dvc#6431 |
| A set of commands to visualize and compare _plot metrics_: | ||
| [show](/doc/command-reference/plots/show), | ||
| [diff](/doc/command-reference/plots/diff), and | ||
| [modify](/doc/command-reference/plots/modify). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should clarify here (in the usage block below) and in https://dvc.org/doc/command-reference/plots/modify that it only works with structured "data series" file plots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related treeverse/dvc#6669
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgeorpinel added context to modify
Also created:
treeverse/dvc#6742
jorgeorpinel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few first quick copy edits (committing)...
Co-authored-by: Restyled.io <commits@restyled.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More copy edits (committing)
Co-authored-by: Restyled.io <commits@restyled.io>
Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
Co-authored-by: Jorge Orpinel <jorgeorpinel@users.noreply.github.com>
|
Sorry for the delay and thanks @pared ! |
| DVC can work with two types of plots files: | ||
|
|
||
| In contrast to `dvc metrics`, these metrics should be stored as data series. | ||
| Unlike its `dvc metrics` counterpart, `dvc plots diff` cannot calculate numeric | ||
| differences between the metrics in different experiments. | ||
| 1. Data series files, which can be JSON, YAML, CSV or TSV. | ||
| 2. Image files in JPEG, GIF, or PNG format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know what we're going to need to revisit this explanation. I think... We already have 2 types of metrics in this same doc so this is going to be confusing... I'm not sure how or where else to present this feature so that it's clearer though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the handling of different types of plots should be explained in a separate document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure... Maybe the types of metrics (metrics vs. plots) shouldn't be there. Moved to #2956

Per treeverse/dvc#6431 (specs)
closes #2807