Make `getproperty(df, col)` return a full length view of the column

Similar for the single arg `getindex` (`getindex(df, col::Symbol)`), and `eachcol`.

We are progressively knocking out ways of ending up with columns of different sizes.

Right now here are ways that you can't end up with incosistant sizes:
 - You can't assign in a new column that doesn't have the right size, as `setproperty` checks the size.
 - You can't pass in a `Vector` then resize it using another reference, as the `DataFrame` constructor copies.
 - You can't use normal indexing indexing to get access to columns that you then resize, as normal indexing returns copies 
 - you can't use `@view` indexing returns `SubDataFrame`s which disallow size mutating operations, and who's column vectors are views anyway, so also disallow resizing operations (right?) 
 - and size `setproperty`  (and 1 arg setindex) check the size of the column being added, you can't just insert one with incorrect size.

Right now the only way I can think of getting access to a inner column and then mutating its size,
is the use of `getproperty` or single argument `getindex` or `eachcol`.
Which return the actually underlying `Vector`.

And I was thinking: It would be great if we could just wrap those in some kind of `view` like array wrapper that doesn't allow resizing in place, but allows all the other operations one would hope from a `Vector`, including in-place `setindex` of elements.

Turns out such a view likes wrapper does exist.
The `SubArray`.
We can just return a the equivalent  of `@view df.col[:]`.
The overhead of creating a view is tiny, as is the overhead of working with a view (at least when the indexing is simple which it is in this case).
You can do all the things to a `SubArray` as long as you don't call `resize!` on it -- that is a `MethodError`.

If someone needs to actually access the raw array, then they can use `parent(df.col)`.
Only reason I can think that would really be needed is for if a method has a overly strict set of type constraints, then accessing the `parent` would be an alternative to `collect` (with its own trade-offs).

I think this would knock off the last possible way to end up with a corrupt `DataFrame`,
via column related shenanigans.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make `getproperty(df, col)` return a full length view of the column #1844

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Make getproperty(df, col) return a full length view of the column #1844

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `getproperty(df, col)` return a full length view of the column #1844