Skip to content

Why isn't df.col .= v in-place? #3200

@gustafsson

Description

@gustafsson

This issue is a question on the most recent release notes

On Julia 1.7 or newer broadcasting assignment into an existing column of a data frame replaces it. Under Julia 1.6 or older it is an in place operation. (#3022)

I expected df.col .= v to broadcast and do in-place assignment. But I see that's no longer the case in Dataframes.jl 1.4.

The recent release broke some code of mine (I must have missed any deprecation warnings). A simple workaround was coltemp = df.col; coltemp .= v but I don't understand the reason for the new behaviour. To me this seems to make DataFrame inconsistent with other containers in Julia and left me wondering why this inconsistency would be a wanted one.

This issue equally applies to df[!, :x] .= v.

Compare:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> x .= 1.5
ERROR: InexactError: Int64(1.5)

Whereas

julia> df = DataFrame(x=[1, 2, 3])
3×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3

julia> x = df.x;

julia> df.x .= 1.5
3-element Vector{Float64}:
 1.5
 1.5
 1.5
 
julia> x === df.x
 false

As advertised df.x .= 1.5 does not work in-place but replaces the column, even with a new type.

If I put the vector in any other container, say, a Dict, NamedTuple or struct

julia> dt = Dict(:x=>[1, 2, 3])
Dict{Symbol, Vector{Int64}} with 1 entry:
  :x => [1, 2, 3]

julia> dt[:x] .= 1.5
ERROR: InexactError: Int64(1.5)

julia> nt = (x = [1, 2, 3],)
(x = [1, 2, 3],)

julia> nt.x .= 1.5
ERROR: InexactError: Int64(1.5)

julia> struct S
       x
       end

julia> s = S([1, 2, 3])
S([1, 2, 3])

julia> s.x .= 1.5
ERROR: InexactError: Int64(1.5)

They all behave the same. But a DataFrame behaves differently. Why is that?


The docs state "Since df[!, :col] does not make a copy" which to me makes it unexpected that it would create a new column rather than modifying the existing one.


For the use case of "create/replace column" we have df.x = v (akin to s.x = v or dict[:x] = v). Would there be any adverse side-effects of letting = broadcast scalars into new/replaced columns?


I understand there was a decision a year ago (#2804) to make df.x .= v work like d[!,:x] .= v but wouldn't a change to instead make df[!,:x] .= v work like df.x .= v have been more consistent with how containers in Julia typically work?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions