Fix partial updates in Python using dicts #1298
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1268 by refactoring the data loading path for dict-shaped datasets.
There are some inconsistencies between how
None
values are handled in JS and in Python. In JS,null
andundefined
in a dataset mean different things in a partial update:undefined
resolves to a no-op, so values that areundefined
will not be modified or overwritten. However, if the value is replaced with null:To remedy this, the Python library uses a method called
_has_column
, which ascertains whether a given row contains a given column name. If the column name does not exist, it is a no-op similar toundefined
, but if the column name exists then it will be overwritten with the new value.Thus, a row-oriented update of
[{a: undefined}]
that would work as a no-op update in Javascript does not work in Python. In the column-oriented case, #1268 illustrates the issue where a missing column in a columnar dataset would be treated as an overwrite, and not a no-op. This PR fixes the behavior by treating missing columns in columnar datasets as no-ops, and has been tested. This behavior is now equivalent between JS and Python.There will always remain Python-specific idiosyncrasies around partial updates. For example, this update works in JS:
but it would not work in Python, because one cannot satisfy the "all columns must have the same # of rows" requirement AND mark a
column[row]
as a no-op, and there is no way to reconcile this behavior: