-
-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Introduction
An out-of-band (abbrev. OOB) change is an edit of a collaborative file not done through JupyterLab's UI. For example, suppose user A runs git reset --hard <other-branch> while another user B is editing a collaborative document within the same repository. A's git reset deletes all edits made by B and sets the contents of that document to some fixed value. Then the YStore table containing B's edits needs to be dropped, and this event must be somehow communicated to all Yjs clients connected to the document. Here, we say that an out-of-band change occurred on B's document.
Description
The current implementation of OOB change detection results in more data loss than necessary. Fixing this issue will likely reduce instances of user data loss such as the one reported in #219. I've written this issue in two parts below.
Context: how OOB changes are detected currently
OOB change detection is implemented via polling in FileLoader. Here are the relevant portions of its source:
class FileLoader:
"""
A class to centralize all the operation on a file.
"""
def __init__(
self, ...
) -> None:
...
self._subscriptions: dict[str, Callable[[], Coroutine[Any, Any, None]]] = {}
self._watcher = asyncio.create_task(self._watch_file()) if self._poll_interval else None
self.last_modified = None
def observe(self, id: str, callback: Callable[[], Coroutine[Any, Any, None]]) -> None:
"""
Subscribe to the file to get notified about out-of-band file changes.
Parameters:
id (str): Room ID
callback (Callable): Callback for notifying the room.
"""
self._subscriptions[id] = callback
async def _watch_file(self) -> None:
"""
Async task for watching a file.
"""
...
while True:
try:
await asyncio.sleep(self._poll_interval)
try:
await self.maybe_notify()
except Exception as e:
self._log.error(f"Error watching file: {self.path}\n{e!r}", exc_info=e)
except asyncio.CancelledError:
break
...
async def maybe_notify(self) -> None:
"""
Notifies subscribed rooms about out-of-band file changes.
"""
do_notify = False
async with self._lock:
# Get model metadata; format and type are not need
model = await ensure_async(self._contents_manager.get(self.path, content=False))
if self.last_modified is not None and self.last_modified < model["last_modified"]:
do_notify = True
self.last_modified = model["last_modified"]
if do_notify:
# Notify out-of-band change
# callbacks will load the file content, thus release the lock before calling them
for callback in self._subscriptions.values():
await callback()Here, self.last_modified is a value that is set by FileLoader's' filesystem methods, FileLoader.load_content() and FileLoader.maybe_save_content(). FileLoader.maybe_notify() is repeatedly called to check the latest last_modified time (mtime) of the file against its previously recorded mtime. If the latest mtime is greater, then the file was probably modified directly by the server, and an OOB change probably occurred. Each parent is then notified of this OOB change by their callback, previously appended by calling FileLoader.observe().
In jupyter_collaboration, DocumentRoom is the parent initializing the FileLoader. DocumentRoom subscribes to FileLoader with a callback after it is initialized:
class DocumentRoom(YRoom):
"""A Y room for a possibly stored document (e.g. a notebook)."""
def __init__(
self, ...
):
super().__init__(ready=False, ystore=ystore, log=log)
...
self._file.observe(self.room_id, self._on_outofband_change)Issue part 1: Using mtime alone is unreliable
When using mtime alone, false negatives (i.e. OOB change going undetected) occur because:
-
A file's mtime can have an imprecision of up to a few seconds (depending on the filesystem). Within that interval, the server can modify a file without updating its mtime, resulting in a false negative.
-
The POSIX standard allows for some system calls like
mmap()to update the content of a file without updating its mtime.
When using mtime alone, false positives (i.e. an OOB change event is emitted without an OOB change) occur because:
- Writing identical content to a file usually updates the
mtimebut leaves the content unaffected. If an OOB change is signaled here, then a false positive occurs, and collaboration is halted for no reason.- This can be triggered by anything that saves a file to disk. For example, if a user opens a file with
vimand immediately exits by running:wq, that would cause a OOB change to be wrongly signaled.
- This can be triggered by anything that saves a file to disk. For example, if a user opens a file with
Both false positives and false negatives result in data loss of at least one client's intent. If we consider the previous example with A and B, a false positive results in A's git reset going ignored, and a false negative results in halting collaboration on B's notebook no reason. In both cases, either A or B's intended content is ignored.
False positives are especially dangerous because of how easily they can be triggered and how they completely halt collaboration on a document. I suspect that there are specific common user scenarios that trigger these OOB false positives, which contributes towards a higher frequency of data loss issues for users.
Issue part 2: Redundant way of signaling OOB changes
FileLoader has a redundant way of signaling OOB changes to its parent object, but only when a file is saved through FileLoader.maybe_save_content():
async def maybe_save_content(self, model: dict[str, Any]) -> None:
"""
Save the content of the file.
Parameters:
model (dict): A dictionary with format, type, last_modified, and content of the file.
Raises:
OutOfBandChanges: if the file was modified at a latter time than the model
### Note:
If there is changes on disk, this method will raise an OutOfBandChanges exception.
"""
async with self._lock:
...
if self.last_modified == m["last_modified"]:
...
else:
# file changed on disk, raise an error
self.last_modified = m["last_modified"]
raise OutOfBandChangesHere, it signals an OOB change having occurred by raising an OutOfBandChanges exception. The redundant signaling of an OOB change causes DocumentRoom to similarly provide a redundant handling of an OOB change:
async def _maybe_save_document(self, saving_document: asyncio.Task | None) -> None:
"""
Saves the content of the document to disk.
### Note:
There is a save delay to debounce the save since we could receive a high
amount of changes in a short period of time. This way we can cancel the
previous save.
"""
...
try:
...
await self._file.maybe_save_content(...)
...
except OutOfBandChanges:
... # <= the block here does not call `self._on_outofband_change()`
...Proposed new implementation
- Use other file metadata from
os.stat()to detect OOB changes more accurately. Changes in file size should be the deciding factor in whether a file was changed, not changes in mtime. - Replace the redundant signaling of OOB changes with the
FileLoader.observe()signaling API.