Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix partial updates in Python using dicts #1298

Merged
merged 1 commit into from
Jan 28, 2021
Merged

Conversation

sc1f
Copy link
Contributor

@sc1f sc1f commented Jan 20, 2021

Fixes #1268 by refactoring the data loading path for dict-shaped datasets.

There are some inconsistencies between how None values are handled in JS and in Python. In JS, null and undefined in a dataset mean different things in a partial update:

table = perspective.table({a: [1, 2, 3, 4], b: ["a", "b", "c", "d"], c: [10, 20, 30, 40]}, {index: "a"})
table.update({a: [3], c: [100]}) // "b" is `undefined` here, so values in `b` are not modified.

undefined resolves to a no-op, so values that are undefined will not be modified or overwritten. However, if the value is replaced with null:

table.update({a: [3], b: [null], c: [100]}) // "b" is explicitly "null", which will unset the value at "b" for pkey `3` and set it to null.

To remedy this, the Python library uses a method called _has_column, which ascertains whether a given row contains a given column name. If the column name does not exist, it is a no-op similar to undefined, but if the column name exists then it will be overwritten with the new value.

Thus, a row-oriented update of [{a: undefined}] that would work as a no-op update in Javascript does not work in Python. In the column-oriented case, #1268 illustrates the issue where a missing column in a columnar dataset would be treated as an overwrite, and not a no-op. This PR fixes the behavior by treating missing columns in columnar datasets as no-ops, and has been tested. This behavior is now equivalent between JS and Python.

There will always remain Python-specific idiosyncrasies around partial updates. For example, this update works in JS:

table.update({
  a: [2, 3, 4],
  b: [null, undefined, null] // delete, no-op, delete
})

but it would not work in Python, because one cannot satisfy the "all columns must have the same # of rows" requirement AND mark a column[row] as a no-op, and there is no way to reconcile this behavior:

table.update({
  "a": [2, 3, 4],
  "b": [None, None, None] # delete, delete, delete
})

@sc1f sc1f force-pushed the fix-py-partial-update branch from b0f69ad to bc85acd Compare January 20, 2021 15:44
@sc1f sc1f requested a review from texodus January 20, 2021 15:50
@sc1f sc1f marked this pull request as ready for review January 20, 2021 15:50
Copy link
Member

@texodus texodus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the PR!

@texodus texodus merged commit ff7ad22 into master Jan 28, 2021
@texodus texodus deleted the fix-py-partial-update branch January 28, 2021 05:28
@texodus texodus removed the 0.6.1 label Jan 30, 2021
@texodus texodus added this to the 0.6.1 milestone Jan 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Concrete, reproducible bugs Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistency with table.update() in Python with Dicts
2 participants