All except simplification: Option 1 #1036
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #761
This PR rewrites
except
to disallow pointing to nested columns. The columns selection DSL should only be allowed to "select" columns, not restructure it; we havemove
,remove
, etc. for that.When a user now calls, say
they will receive an
IllegalArgumentException
:Cannot exclude the nested columns '[userData/age]' from the column set '[userData/{age, name}, otherCol1, otherCol2]' with the except-functions. Use the `DataFrame<*>.remove { }` operation instead.
because
userData.age
is not directly selected byall()
, only its parent,userdata
, is.TODO:
Some internal struggle: It seems like
except
was used internally in a couple places and it relied on the old behavior.Rewriting this now pushes me to this horrible notation:
Before:
df.groupBy { allExcept(columnSetWithNestedCols) }...
After:
(something like
df.remove {}.groupBy {}
wouldn't not work, because you still want the columns inside the group)Having to write something like this makes me rethink what we should do about this. Clearly there's a need for selecting the top-level columns of a version of the current dataframe where a (set of) column(s) and their parents are removed. Though, it seems out of the scope for
except
(and the 0.15 version ofexcept
even breaks in some cases). We could:except
and keep it as isremove
on it. Probably also needs adf.toColumnSet()
function.@zaleslaw @koperagen @nikitinas @belovrv WDYT?