Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables.jl uses eachcolumn and a new PR to base does too #1611

Closed
pdeffebach opened this issue Nov 29, 2018 · 16 comments
Closed

Tables.jl uses eachcolumn and a new PR to base does too #1611

pdeffebach opened this issue Nov 29, 2018 · 16 comments

Comments

@pdeffebach
Copy link
Contributor

JuliaLang/julia#29749 (comment) seeks to implement an eachcolumn feature on AbstractArray types.

There is a discussion about how to handle the backporting there, but we can't do import eachcolumn is our function is named eachcol.

Tables.jl also calls their column iterator eachcolumn.

Maybe we should follow suit for consistency and change the name? It's massively breaking, though.

@quinnj
Copy link
Member

quinnj commented Nov 30, 2018

Tables.jl doesn't export eachcolumn, so the use has been and always will be Tables.eachcolumn for precisely this kind of scenario.

@pdeffebach
Copy link
Contributor Author

Ah. It makes sense that it was designed to avoid confusion.

Still a bit odd to have (rows, eachcolumn), (eachrow, eachcol), and (eachrow, eachcolumn). But that's another issue.

@nalimilan
Copy link
Member

That PR implements eachcol, not eachcolumn AFAICT.

What worries me, though, is that eachcol currently returns a name => column pair, while Base.eachcol will return a view on the column. For consistency, it would be nice if we returned just the column (like columns does). We could deprecate eachcol(df) in favor of eachcol(df, true) or something like that. I wish we had considered this when discussing #1586. @bkamins What do you think?

@bkamins
Copy link
Member

bkamins commented Nov 30, 2018

I was unaware of JuliaLang/julia#29749 when making the changes (it was submitted exactly in parallel).

Acutally initially I have proposed something like eachcol(df, true) for current behavior if I remember correctly. The problem is that this way eachcol would not be type stable. We would have to use eachcol(df, Val(true)) or somethng similar. The good thing is that the columns change was not tagged yet so we can change it.

Given https://github.com/JuliaLang/julia/pull/29749 I think what we should do is:

  • in this release:
    • do not export columns(df)
    • make eachcol(df, Val(false)) call columns(df)
    • change eachcol(df) to eachcol(df, Val(true))
    • when eachcol(df) is called execute eachcol(df, Val(true)) and throw depwarn
  • in the next release
    • when eachcol(df) is called execute eachcol(df, Val(false))
    • get rid of internal columns(df) and move its implementation to eachcol(df, Val(false))

This is a simple fix and I can push a PR that does this.

The only problem is that writing eachcol(df, Val(true)) etc. is clumsy. If there are proposals for a better naming scheme please write.

@nalimilan
Copy link
Member

nalimilan commented Nov 30, 2018

The only problem is that writing eachcol(df, Val(true)) etc. is clumsy. If there are proposals for a better naming scheme please write.

Yeah, that's the main problem. I'm really tempted to deprecate eachcol in favor of zip(names(df), columns(df)) or if we don't export columns(df) in favor of zip(names(df), eachcol(df, Val(false))). Then when we remove the deprecation that can become zip(names(df), eachcol(df)).

@bkamins
Copy link
Member

bkamins commented Nov 30, 2018

Actually - having thought of that - if we want to allow zip(names(df), eachcol(df)) then we can also allow eachcol(df, Val(true)) as it is even shorter. So this means that we could move forward with what I proposed above if all agree.

@nalimilan
Copy link
Member

My idea was that we would eventually remove these methods based on Val, which are not really more convenient than zip and much uglier.

@bkamins
Copy link
Member

bkamins commented Nov 30, 2018

OK, so do we move forward with what I have outlined?

@nalimilan
Copy link
Member

Actually, if we inline eachcol, eachcol(df, true) be type-stable in the typical case where true is passed as a literal, right?

@bkamins
Copy link
Member

bkamins commented Nov 30, 2018

Good point. This is correct as constant propagation will be performed and the outer calling function shall be type-stable.

@pdeffebach
Copy link
Contributor Author

Yeah, that's the main problem. I'm really tempted to deprecate eachcol in favor of zip(names(df), columns(df)) or if we don't export columns(df) in favor of zip(names(df), Val(false))

This seems too verbose for what might be a common operation.

Why not have them change their iterator to columns?

Could we use a wrapper eachcol(df, names = true) and then just use a function barrier for efficiency? Or am I over-estimating type stability.

@bkamins
Copy link
Member

bkamins commented Nov 30, 2018

eachcol(df) will be type stable always and eachcol(df, x) will be type stable inside a function when x is a literal Bool (and this is when we care about it).

This is how it should work. I will doublecheck this when submitting a PR.

Should we go forward with this now or wait for JuliaLang/julia#29749 to be decided?

@nalimilan
Copy link
Member

I guess you can prepare a PR to be ready when it's merged. Not sure whether it will be in 1.1.

@nalimilan
Copy link
Member

BTW, I'm not even sure type stability matters at all since iterating over columns is inherently type-unstable already.

@bkamins
Copy link
Member

bkamins commented Nov 30, 2018

I will prepare a PR now. I think we should not introduce columns function now just to remove it in 3 months. I will submit the PR soon (it is relatively simple) and then we can decide what to do.

@bkamins
Copy link
Member

bkamins commented Dec 3, 2018

Closing as this is handled by #1612.

@bkamins bkamins closed this as completed Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants