Description
🐛🐛 Bug Report
To start with, awesome work getting the merge functionality implemented! I have used it for a while now and discovered a few issues with it:
Text tensors are not merged
When two branches are merged, where both branches has the same tensors, some tensors are empty after merge. This applies to e.g. htype="text"
and htype="generic"
. See example on colab here.
UI shows errors during a merge
If using the ActiveLoop user interface while two branches are merging the dataset fails to load and there are error messages "Dataset is corrupted or does not exist".
Exception about modification on read-only tensor during merge
While merging two branches I got this exception once. Rerunning with the same exact code a second time worked.
Error in sys.excepthook:
Traceback (most recent call last):
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/humbug/report.py", line 498, in _hook
self.error_report(error=exception_instance, tags=tags, publish=publish)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/humbug/report.py", line 244, in error_report
traceback.format_exception(
TypeError: format_exception() got an unexpected keyword argument 'etype'
Original exception was:
Traceback (most recent call last):
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/diff_commits.py", line 84, in <module>
diff_commits()
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/diff_commits.py", line 79, in diff_commits
ds_new.merge(f"debug4/diff-{hash1}-{hash2}-removed")
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/humbug/report.py", line 445, in wrapped_callable
return callable(*args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/invalid_view_op.py", line 22, in inner
return callable(x, *args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/iteration_warning.py", line 11, in inner
res = callable(x, *args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/core/dataset/dataset.py", line 1193, in merge
merge(self, target_id, conflict_resolution, delete_removed_tensors, force)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/merge.py", line 59, in merge
merge_common_tensors(common_tensors, dataset, target_ds, nodes, conflict_resolution)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/merge.py", line 289, in merge_common_tensors
merge_tensor_data(
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/merge.py", line 472, in merge_tensor_data
original_tensor.append(sample)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/invalid_view_op.py", line 22, in inner
return callable(x, *args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/core/tensor.py", line 377, in append
self.extend([sample], progressbar=False)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/util/invalid_view_op.py", line 22, in inner
return callable(x, *args, **kwargs)
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/core/tensor.py", line 306, in extend
self._write_initialization()
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/core/tensor.py", line 259, in _write_initialization
self.storage.check_readonly()
File "/Users/danielfalk/testprgm/sandbox/deeplake/versioning_tutorial/venv/lib/python3.10/site-packages/deeplake/core/storage/provider.py", line 153, in check_readonly
raise ReadOnlyModeError()
deeplake.util.exceptions.ReadOnlyModeError: Modification when in read-only mode is not supported!
Removed samples are not removed after a merge
Data samples that are removed on a branch does not get removed when that branch is merged.
This might be a design decision, that when merging it is the current state of the source branch that is merged, not the changes/deltas on the branch. This "union" behaviour feels very unnatural for a "merge" and is not how it works in e.g. git. I could not find any documentation explaining this behaviour.
Merged commits are not visible in history
One of the main advantages with version control of the data is that I can look in the log and see what different changes people has done to the dataset. This does not really work as is now since a merge only creates a "merged ..." commit message, it does not show the commits that was done to other branches. This means that if using trunk-based development where the changes are always done on a feature branch, committed with a self-explaining commit message and then merged with master, there will not be any useful commit messages on the main branch, only "merge ..." messages. I think we would really need something like gitk etc. that shows the different commits on the branches that are merged.