-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow permutaion of names in rename! #1974
Conversation
This is a nice proposal. |
I have given a quick glance at the PR. Is your proposal is atomic? When we make this change it would probably good to ensure this. What I mean is either |
The current implementation will cause the df not able to print when wrong: julia> df = DataFrame(A = 1:3, B = 'A':'C')
3×2 DataFrame
│ Row │ A │ B │
│ │ Int64 │ Char │
├─────┼───────┼──────┤
│ 1 │ 1 │ 'A' │
│ 2 │ 2 │ 'B' │
│ 3 │ 3 │ 'C' │
julia> rename!(df, :A => :B)
ERROR: Tried renaming to B, when it already exists in the Index.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] rename!(::DataFrames.Index, ::Array{Pair{Symbol,Symbol},1}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:62
[3] rename!(::DataFrames.Index, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:67
[4] rename!(::DataFrame, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/abstractdataframe.jl:116
[5] top-level scope at none:0
julia> df
Error showing value of type DataFrame:
ERROR: AssertionError: length(idx.names) == length(idx.lookup) == ncols
Stacktrace:
[1] _check_consistency(::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/dataframe/dataframe.jl:290
[2] #_show#343(::Bool, ::Bool, ::Bool, ::Symbol, ::Bool, ::Nothing, ::Function, ::IOContext{REPL.Terminals.TTYTerminal}, ::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/show.jl:517
[3] #_show at ./none:0 [inlined]
[4] #show#344 at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/show.jl:609 [inlined]
[5] #show at ./none:0 [inlined]
[6] #show#357(::Bool, ::Bool, ::Bool, ::Symbol, ::Bool, ::Function, ::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/io.jl:53
[7] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::DataFrame) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/io.jl:53
[8] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:131
[9] display(::REPL.REPLDisplay, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:135
[10] display(::Any) at ./multimedia.jl:287
[11] #invokelatest#1 at ./essentials.jl:742 [inlined]
[12] invokelatest at ./essentials.jl:741 [inlined]
[13] print_response(::IO, ::Any, ::Any, ::Bool, ::Bool, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:155
[14] print_response(::REPL.AbstractREPL, ::Any, ::Any, ::Bool, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:140
[15] (::getfield(REPL, Symbol("#do_respond#38")){Bool,getfield(REPL, Symbol("##48#57")){REPL.LineEditREPL},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:714
[16] #invokelatest#1 at ./essentials.jl:742 [inlined]
[17] invokelatest at ./essentials.jl:741 [inlined]
[18] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/LineEdit.jl:2273
[19] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:1035
[20] run_repl(::REPL.AbstractREPL, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:192
[21] (::getfield(Base, Symbol("##734#736")){Bool,Bool,Bool,Bool})(::Module) at ./client.jl:362
[22] #invokelatest#1 at ./essentials.jl:742 [inlined]
[23] invokelatest at ./essentials.jl:741 [inlined]
[24] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:346
[25] exec_options(::Base.JLOptions) at ./client.jl:284
[26] _start() at ./client.jl:436 Do we need to restore it to original when it fails? |
Yes - I think |
yea we thought the same issue. Note the original implementation is not atomic also: julia> df = DataFrame(A = 1:3, B = 'A':'C', C = [:x, :y, :z])
3×3 DataFrame
│ Row │ A │ B │ C │
│ │ Int64 │ Char │ Symbol │
├─────┼───────┼──────┼────────┤
│ 1 │ 1 │ 'A' │ x │
│ 2 │ 2 │ 'B' │ y │
│ 3 │ 3 │ 'C' │ z │
julia> rename!(df, :C=>:D, :B=>:A, :A=>:B)
ERROR: Tried renaming B to A, when A already exists in the Index.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] rename!(::DataFrames.Index, ::Array{Pair{Symbol,Symbol},1}) at /home/lizz/.julia/packages/DataFrames/yH0f6/src/other/index.jl:52
[3] rename!(::DataFrames.Index, ::Pair{Symbol,Symbol}, ::Pair{Symbol,Symbol}, ::Vararg{Pair{Symbol,Symbol},N} where N) at /home/lizz/.julia/packages/DataFrames/yH0f6/src/other/index.jl:60
[4] rename!(::DataFrame, ::Pair{Symbol,Symbol}, ::Pair{Symbol,Symbol}, ::Vararg{Pair{Symbol,Symbol},N} where N) at /home/lizz/.julia/packages/DataFrames/yH0f6/src/abstractdataframe/abstractdataframe.jl:117
[5] top-level scope at none:0
julia> df
3×3 DataFrame
│ Row │ A │ B │ D │
│ │ Int64 │ Char │ Symbol │
├─────┼───────┼──────┼────────┤
│ 1 │ 1 │ 'A' │ x │
│ 2 │ 2 │ 'B' │ y │
│ 3 │ 3 │ 'C' │ z │ How about simply backup the original index and restore if goes wrong? |
a76f0d2
to
667baea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the fixes.
I have left some comments. Additionally before we merge this can you please update the documentation of rename!
and rename
to reflect the change you propose?
Thank you!
The new implementation is a breaking change, because it changed the old behavior |
Signed-off-by: lizz <lizz@sensetime.com>
It is breaking, but I would say that it is "mildly breaking" - i.e. we allow something that was an error earlier, therefore we do not have to go through deprecation. |
Thinking more about it - it is breaking even if something worked (I guess this is what you meant, like |
yes the mental model of the two approaches are different. The original one is some kind of online processing, the current version is not |
Well, actually the new model is a preparation for #1975 |
Co-Authored-By: Bogumił Kamiński <bkamins@sgh.waw.pl>
Co-Authored-By: Bogumił Kamiński <bkamins@sgh.waw.pl>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. For the randomized tests I thought of something like:
import DataFrames: index # this should go on top of the test file
n = Symbol.('a':'z')
Random.seed!(1234)
for k in 1:20
sn = shuffle(n)
df = DataFrame(zeros(1,26), n)
p = Dict(Pair.(n, sn))
cyclelength = Int[]
for x in n
i = 0
y = x
while true
y = p[y]
i += 1
x == y && break
end
push!(cyclelength, i)
end
i = lcm(cyclelength)
while true
rename!(df, p)
@test sort(names(df)) == n
@test sort(collect(keys(index(df).lookup))) == n
@test sort(collect(values(index(df).lookup))) == 1:26
@test all(index(df).lookup[x] == i for (i,x) in enumerate(names(df)))
i -= 1
names(df) == n && break
end
@test i == 0
end
the randomized test passed locally. |
The 1.0 CI error is due to julia> rename!(df, :X => :Y)
ERROR: MethodError: no method matching copy!(::Dict{Symbol,Int64}, ::Dict{Symbol,Int64})
Stacktrace:
[1] rename!(::DataFrames.Index, ::Array{Pair{Symbol,Symbol},1}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:68
[2] rename!(::DataFrames.Index, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/other/index.jl:87
[3] rename!(::DataFrame, ::Pair{Symbol,Symbol}) at /mnt/hdd/dev/DataFrames.jl/src/abstractdataframe/abstractdataframe.jl:116
[4] top-level scope at none:0 |
Signed-off-by: lizz <lizz@sensetime.com>
Signed-off-by: lizz <363664470@qq.com>
Signed-off-by: lizz <363664470@qq.com>
Signed-off-by: lizz <363664470@qq.com>
Thank you for the fixes. Regarding the In general please note (this is a general comment to make sure we are on the same page in the future) that DataFrames.jl intends to support Julia 1.0 at least as long as it is supported by JuliaLang team. |
Signed-off-by: lizz <lizz@sensetime.com>
@nalimilan - OK to merge this? Then #1985 would be rebased against the changes in this PR as it also imports |
Thanks @innerlee! |
Before:
This pr: