-
Couldn't load subscription status.
- Fork 374
Description
Similar for the single arg getindex (getindex(df, col::Symbol)), and eachcol.
We are progressively knocking out ways of ending up with columns of different sizes.
Right now here are ways that you can't end up with incosistant sizes:
- You can't assign in a new column that doesn't have the right size, as
setpropertychecks the size. - You can't pass in a
Vectorthen resize it using another reference, as theDataFrameconstructor copies. - You can't use normal indexing indexing to get access to columns that you then resize, as normal indexing returns copies
- you can't use
@viewindexing returnsSubDataFrames which disallow size mutating operations, and who's column vectors are views anyway, so also disallow resizing operations (right?) - and size
setproperty(and 1 arg setindex) check the size of the column being added, you can't just insert one with incorrect size.
Right now the only way I can think of getting access to a inner column and then mutating its size,
is the use of getproperty or single argument getindex or eachcol.
Which return the actually underlying Vector.
And I was thinking: It would be great if we could just wrap those in some kind of view like array wrapper that doesn't allow resizing in place, but allows all the other operations one would hope from a Vector, including in-place setindex of elements.
Turns out such a view likes wrapper does exist.
The SubArray.
We can just return a the equivalent of @view df.col[:].
The overhead of creating a view is tiny, as is the overhead of working with a view (at least when the indexing is simple which it is in this case).
You can do all the things to a SubArray as long as you don't call resize! on it -- that is a MethodError.
If someone needs to actually access the raw array, then they can use parent(df.col).
Only reason I can think that would really be needed is for if a method has a overly strict set of type constraints, then accessing the parent would be an alternative to collect (with its own trade-offs).
I think this would knock off the last possible way to end up with a corrupt DataFrame,
via column related shenanigans.