Description
This is the summary of the discussions @kieferrm and me had about how to support diff in VS Code, what the UX is like and what are the gaps in the API to get it working properly.
UX (Rich/Plaintext diff)
Notebooks are reprenseted in rich UI (markdown previews, code editors, and outputs rendered in various forms) but the notebook documents are usually stored to file systems in text form and then it can be tracked by source control systems. Today in VS Code you can already do text form diffing but that's limited, for example, if an image/chart output changes, we can't tell what's being changed through text diff.
Thus we want to do rich diffs by rendering notebooks in Notebook Editor side by side, align the cells positions (similar to how we align lines in text diff editor). The catch with this approach is it doesn't present all the data in the document. One example is Jupyter Notebook stores custom metadatas (kernel info, document schema version, etc) but they are never presented in the VS Code UI. To allow users still have a full picture of what's being changed behind the scenes, we may want to still support text based diffing and users can easily switch between them.
FS & Source Control
Currently the two responsbilities of a vscode.NotebookContentProvider
are
- resolving the content for a resource (identified by
vscode.Uri
) and converting to structured datavscode.NotebookDocument
, and - serializing
vscode.NotebookDocument
and saving its text form onto file system.
Since the identifiers for resources are vscode.Uri
, which are always backed up by a file system provider, notebook content providers should use vscode.workspace.fs
to resolve its raw content, instead of using node's fs
.
The source control API in VS Code works seemlessly with vscode.Uri
. For example, if you have a pending file change in a git repo, git extension can provide two resource Uri
for the file, one file:///
uri for current content and one git:///
uri for the content prior to the change. Then we can ask notebook content provider to resolve the content for both Uri
s.
SCM API
export interface SourceControl {
quickDiffProvider?: QuickDiffProvider;
}
interface QuickDiffProvider {
provideOriginalResource?(uri: Uri, token: CancellationToken): ProviderResult<Uri>;
}
Notebook Content Provider
export interface NotebookContentProvider {
/**
* Resolve content from `uri` and convert it to `NotebookData`
* Extensions should use `vscode.workspace.fs` for resolving the raw content for `uri`.
*/
openNotebook(uri: Uri, openContext: NotebookDocumentOpenContext): NotebookData | Promise<NotebookData>;
}
Dirty changes in workspace
Uri
s work great for source control as the content changes are already saved to file system. However if users have a dirty notebook document in the workspace (say auto saved is turned off), we can't differenciate content on disk and content in workspace as they share the same Uri
and VS Code core doesn't know how to turn the dirty vscode.NotebookDocument
to text.
Since vscode.NotebookContentProvider
is the only one who knows how to convert a vscode.NotebookDocument
to text, we will delegate this to the content provider:
export interface NotebookContentProvider {
/**
* Save the text form of `notebookDocument` into `textDocument`
*/
notebookAsText(notebookDocument: NotebookDocument, textDocument: TextDocument): Promise<void>;
}
Diff
We have a complex two way diff algorithm for the text files, which can probably be used for the notebook document too. The catch is how we are going to compare NotebookCell
s efficiently. If notebook providers can provide an unique id for each cell, that would be great. If not, we have to do deep comparison for NotebookCell
content.
The comparison algorithm for NotebookCell
might differ for different notebook providers. For example the GitHub Notebook wants to exclude outputs but Jupyter Notebook may include them. Not sure yet whether this can be described descriptively through metadata or we need to introduce new APIs.