Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow permutaion of names in rename! #1974

Merged
merged 8 commits into from
Oct 14, 2019
Merged

Conversation

innerlee
Copy link
Contributor

@innerlee innerlee commented Oct 5, 2019

Before:

julia> df = DataFrame(A = 1:3, B = 'A':'C')
3×2 DataFrame
│ Row │ A     │ B    │
│     │ Int64 │ Char │
├─────┼───────┼──────┤
│ 11'A'  │
│ 22'B'  │
│ 33'C'  │

julia> rename!(df, :A=>:B, :B=>:A)
ERROR: Tried renaming A to B, when B already exists in the Index.
Stacktrace:
...

This pr:

julia> df = DataFrame(A = 1:3, B = 'A':'C')
3×2 DataFrame
│ Row │ A     │ B    │
│     │ Int64 │ Char │
├─────┼───────┼──────┤
│ 11'A'  │
│ 22'B'  │
│ 33'C'  │

julia> rename!(df, :A=>:B, :B=>:A)
3×2 DataFrame
│ Row │ B     │ A    │
│     │ Int64 │ Char │
├─────┼───────┼──────┤
│ 11'A'  │
│ 22'B'  │
│ 33'C'

@bkamins
Copy link
Member

bkamins commented Oct 5, 2019

This is a nice proposal.
Can you please rebase it against current master (otherwise it will keep failing CI - I have just merged a PR that fixes this)?
After it is rebased I will review it today.
Thank you!

@bkamins
Copy link
Member

bkamins commented Oct 5, 2019

I have given a quick glance at the PR. Is your proposal is atomic?

When we make this change it would probably good to ensure this. What I mean is either rename! succeeds or if it fails it should not render the original DataFrame partially changed.

@innerlee
Copy link
Contributor Author

innerlee commented Oct 5, 2019

The current implementation will cause the df not able to print when wrong:

julia> df = DataFrame(A = 1:3, B = 'A':'C')
3×2 DataFrame
│ Row │ A     │ B    │
│     │ Int64 │ Char │
├─────┼───────┼──────┤
│ 11'A'  │
│ 22'B'  │
│ 33'C'  │

julia> rename!(df, :A => :B)
ERROR: Tried renaming to B, when it already exists in the Index.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] rename!(::DataFrames.Index, ::Array{Pair{Symbol,Symbol},1}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:62
 [3] rename!(::DataFrames.Index, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:67
 [4] rename!(::DataFrame, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/abstractdataframe.jl:116
 [5] top-level scope at none:0

julia> df
Error showing value of type DataFrame:
ERROR: AssertionError: length(idx.names) == length(idx.lookup) == ncols
Stacktrace:
 [1] _check_consistency(::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/dataframe/dataframe.jl:290
 [2] #_show#343(::Bool, ::Bool, ::Bool, ::Symbol, ::Bool, ::Nothing, ::Function, ::IOContext{REPL.Terminals.TTYTerminal}, ::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/show.jl:517
 [3] #_show at ./none:0 [inlined]
 [4] #show#344 at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/show.jl:609 [inlined]
 [5] #show at ./none:0 [inlined]
 [6] #show#357(::Bool, ::Bool, ::Bool, ::Symbol, ::Bool, ::Function, ::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/io.jl:53
 [7] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/io.jl:53
 [8] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:131
 [9] display(::REPL.REPLDisplay, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:135
 [10] display(::Any) at ./multimedia.jl:287
 [11] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [12] invokelatest at ./essentials.jl:741 [inlined]
 [13] print_response(::IO, ::Any, ::Any, ::Bool, ::Bool, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:155
 [14] print_response(::REPL.AbstractREPL, ::Any, ::Any, ::Bool, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:140
 [15] (::getfield(REPL, Symbol("#do_respond#38")){Bool,getfield(REPL, Symbol("##48#57")){REPL.LineEditREPL},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:714
 [16] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [17] invokelatest at ./essentials.jl:741 [inlined]
 [18] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/LineEdit.jl:2273
 [19] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:1035
 [20] run_repl(::REPL.AbstractREPL, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:192
 [21] (::getfield(Base, Symbol("##734#736")){Bool,Bool,Bool,Bool})(::Module) at ./client.jl:362
 [22] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [23] invokelatest at ./essentials.jl:741 [inlined]
 [24] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:346
 [25] exec_options(::Base.JLOptions) at ./client.jl:284
 [26] _start() at ./client.jl:436

Do we need to restore it to original when it fails?

@bkamins
Copy link
Member

bkamins commented Oct 5, 2019

Yes - I think rename! should be atomic - this is what I have just commented in parallel 😄.

@innerlee
Copy link
Contributor Author

innerlee commented Oct 5, 2019

yea we thought the same issue. Note the original implementation is not atomic also:

julia> df = DataFrame(A = 1:3, B = 'A':'C', C = [:x, :y, :z])
3×3 DataFrame
│ Row │ A     │ B    │ C      │
│     │ Int64 │ Char │ Symbol │
├─────┼───────┼──────┼────────┤
│ 11'A'  │ x      │
│ 22'B'  │ y      │
│ 33'C'  │ z      │

julia> rename!(df, :C=>:D, :B=>:A, :A=>:B)
ERROR: Tried renaming B to A, when A already exists in the Index.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] rename!(::DataFrames.Index, ::Array{Pair{Symbol,Symbol},1}) at /home/lizz/.julia/packages/DataFrames/yH0f6/src/other/index.jl:52
 [3] rename!(::DataFrames.Index, ::Pair{Symbol,Symbol}, ::Pair{Symbol,Symbol}, ::Vararg{Pair{Symbol,Symbol},N} where N) at /home/lizz/.julia/packages/DataFrames/yH0f6/src/other/index.jl:60
 [4] rename!(::DataFrame, ::Pair{Symbol,Symbol}, ::Pair{Symbol,Symbol}, ::Vararg{Pair{Symbol,Symbol},N} where N) at /home/lizz/.julia/packages/DataFrames/yH0f6/src/abstractdataframe/abstractdataframe.jl:117
 [5] top-level scope at none:0

julia> df
3×3 DataFrame
│ Row │ A     │ B    │ D      │
│     │ Int64 │ Char │ Symbol │
├─────┼───────┼──────┼────────┤
│ 11'A'  │ x      │
│ 22'B'  │ y      │
│ 33'C'  │ z      │

How about simply backup the original index and restore if goes wrong?

@innerlee innerlee force-pushed the zz/rename branch 2 times, most recently from a76f0d2 to 667baea Compare October 5, 2019 08:33
src/other/index.jl Outdated Show resolved Hide resolved
src/other/index.jl Outdated Show resolved Hide resolved
src/other/index.jl Outdated Show resolved Hide resolved
test/dataframe.jl Outdated Show resolved Hide resolved
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fixes.

I have left some comments. Additionally before we merge this can you please update the documentation of rename! and rename to reflect the change you propose?

Thank you!

@innerlee
Copy link
Contributor Author

innerlee commented Oct 5, 2019

The new implementation is a breaking change, because it changed the old behavior New names are processed sequentially. A new name must not already exist in the `DataFrame` at the moment an attempt to rename a column is performed.

Signed-off-by: lizz <lizz@sensetime.com>
@bkamins
Copy link
Member

bkamins commented Oct 5, 2019

It is breaking, but I would say that it is "mildly breaking" - i.e. we allow something that was an error earlier, therefore we do not have to go through deprecation.

src/other/index.jl Outdated Show resolved Hide resolved
src/other/index.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Oct 5, 2019

Thinking more about it - it is breaking even if something worked (I guess this is what you meant, like :A=>:B, :B=>:C with a data frame having only column :A, but I would assume that this was a rare/non-existent use case, and writing a correct deprecation code would be hard).

@innerlee
Copy link
Contributor Author

innerlee commented Oct 5, 2019

yes the mental model of the two approaches are different. The original one is some kind of online processing, the current version is not

@innerlee
Copy link
Contributor Author

innerlee commented Oct 5, 2019

Well, actually the new model is a preparation for #1975

innerlee and others added 2 commits October 5, 2019 21:38
Co-Authored-By: Bogumił Kamiński <bkamins@sgh.waw.pl>
Co-Authored-By: Bogumił Kamiński <bkamins@sgh.waw.pl>
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. For the randomized tests I thought of something like:

import DataFrames: index # this should go on top of the test file

n = Symbol.('a':'z')
Random.seed!(1234)
for k in 1:20
    sn = shuffle(n)
    df = DataFrame(zeros(1,26), n)
    p = Dict(Pair.(n, sn))
    cyclelength = Int[]
    for x in n
        i = 0
        y = x
        while true
            y = p[y]
            i += 1
            x == y && break
        end
        push!(cyclelength, i)
    end
    i = lcm(cyclelength)
    while true
        rename!(df, p)
        @test sort(names(df)) == n
        @test sort(collect(keys(index(df).lookup))) == n
        @test sort(collect(values(index(df).lookup))) == 1:26
        @test all(index(df).lookup[x] == i for (i,x) in enumerate(names(df)))
        i -= 1
        names(df) == n && break
    end
    @test i == 0
end

@innerlee
Copy link
Contributor Author

innerlee commented Oct 5, 2019

the randomized test passed locally.

@innerlee
Copy link
Contributor Author

innerlee commented Oct 6, 2019

The 1.0 CI error is due to

julia> rename!(df, :X => :Y)
ERROR: MethodError: no method matching copy!(::Dict{Symbol,Int64}, ::Dict{Symbol,Int64})
Stacktrace:
 [1] rename!(::DataFrames.Index, ::Array{Pair{Symbol,Symbol},1}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:68
 [2] rename!(::DataFrames.Index, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:87
 [3] rename!(::DataFrame, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/abstractdataframe.jl:116
 [4] top-level scope at none:0

Signed-off-by: lizz <lizz@sensetime.com>
src/other/index.jl Outdated Show resolved Hide resolved
test/dataframe.jl Outdated Show resolved Hide resolved
Signed-off-by: lizz <363664470@qq.com>
Signed-off-by: lizz <363664470@qq.com>
Signed-off-by: lizz <363664470@qq.com>
@bkamins
Copy link
Member

bkamins commented Oct 13, 2019

Thank you for the fixes. Regarding the copy! issue I leave it to @nalimilan to decide, as he has a better feel for such things.

In general please note (this is a general comment to make sure we are on the same page in the future) that DataFrames.jl intends to support Julia 1.0 at least as long as it is supported by JuliaLang team.

Signed-off-by: lizz <lizz@sensetime.com>
@bkamins
Copy link
Member

bkamins commented Oct 14, 2019

@nalimilan - OK to merge this?

Then #1985 would be rebased against the changes in this PR as it also imports copy! from Future (CC @Ellipse0934).

@nalimilan nalimilan merged commit a237d23 into JuliaData:master Oct 14, 2019
@nalimilan
Copy link
Member

Thanks @innerlee!

@innerlee innerlee deleted the zz/rename branch October 15, 2019 01:55
@bkamins bkamins mentioned this pull request Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants