-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Vector conversion for a DataFrame #1461
Comments
I think we should add an |
So the todo would be:
If we are OK with this plan I can implement it. |
Would changing the output of the iterator based on a keyword argument lead to type instability? If it is true, it returns a tuple, if not, it returns a vector? |
The idea is the following:
and then in any function you specify:
or
And it will be type stable AFAIK. |
I think that is a good idea. maybe |
Ah - I see what you mean - we could use |
It might be hard to make this as fast as
Perhaps this performance difference will go away if we stop returning |
This in tests is actually faster:
than |
Cool. let's add the functionality then, and if we decide we need it to be a separate iterator rather than a parametric type we can always do that. |
OK - @nalimilan - do you have any comments before the PR? |
This actually might be a good time to think about #1335 and type stability of columns. What if DataFrames isn't type stable, but its iterator is? |
Yes - this is the issue, but fortunately only noticeable if work done on a column is small; if the work is large enough then it is usually delegated to a function that works as barrier-function. |
True. I've been benchmarking some large |
Sorry, I recognize this is a touch off topic, but not sure where its best to bring it up: seems like a lack of type-stability of columns underlies many of the performance issues for DataFrames (if I understand what's going on). Is moving to type-stable columns a subject of discussion? If so, where? |
And #744 - and old issue that is still open |
thx |
@bkamins Do you think something still needs to be done here? |
It can be closed given our current implementation (there is still a deprecation period finished by #1613, but we will not lose track of it). The type stability issue is important, but I would discuss it in the other threads. |
Every once in a while I need to access columns of a
DataFrame
as a vector of vectors. This is exactly whatdf.columns
is, but of course we should not expose it. On the other handeachcol
is not very user friendly now.What we could do:
Vector(df::DataFrame) = copy(df.columns)
conversion;eachcol
to be more usable.Actually I prefer the first option as we already have
Matrix
conversion. Any thoughts?The text was updated successfully, but these errors were encountered: