Description
Now that NamedTuple
is in Julia Base (on 0.7), we should use it in DataFrames. A possible approach which has been discussed on Slack would be to define a new TypedDataFrame{C<:NamedTuple} <: AbstractDataFrame
type, which would just wrap a NamedTuple
, and include information regarding the column names and types in its type parameter C
. Then DataFrame
would be a convenience type wrapping a TypedDataFrame
object, with the advantages that 1) columns can be added/removed/renamed, and 2) functions do not need to be compiled for each combination of column types when it doesn't improve performance. TypedDataFrame
would be useful to improve performance where the user provides a custom function operating on each row (notably groupby
), as specialization is essential for performance in that case.
I have experimented this approach in the nl/typed branch, using the NamedTuples.jl package (since I used Julia 0.6 at first). It's mostly working (and passes some tests), but lots of fixes are needed to pass all tests. In particular, I originally removed the Index
type since I thought NamedTuple
could replace it (hence the sym_colinds
and int_colinds
attempts), but I'm not sure that's possible since we sometimes need to convert symbols to integer indices and vice-versa (cf. #1305). Handling of duplicate column names need to be improved (rebased on top of #1308).
@bkamins This touches areas you improved recently, and I'm not sure I'll be able to find the time to finish this right now. To avoid stepping on each other's toes (since this is required to implement the getindex
improvements you've mentioned), if you want to take this up just let me know.