Skip to content

Base DataFrame on NamedTuple #1335

Closed
Closed
@nalimilan

Description

@nalimilan

Now that NamedTuple is in Julia Base (on 0.7), we should use it in DataFrames. A possible approach which has been discussed on Slack would be to define a new TypedDataFrame{C<:NamedTuple} <: AbstractDataFrame type, which would just wrap a NamedTuple, and include information regarding the column names and types in its type parameter C. Then DataFrame would be a convenience type wrapping a TypedDataFrame object, with the advantages that 1) columns can be added/removed/renamed, and 2) functions do not need to be compiled for each combination of column types when it doesn't improve performance. TypedDataFrame would be useful to improve performance where the user provides a custom function operating on each row (notably groupby), as specialization is essential for performance in that case.

I have experimented this approach in the nl/typed branch, using the NamedTuples.jl package (since I used Julia 0.6 at first). It's mostly working (and passes some tests), but lots of fixes are needed to pass all tests. In particular, I originally removed the Index type since I thought NamedTuple could replace it (hence the sym_colinds and int_colinds attempts), but I'm not sure that's possible since we sometimes need to convert symbols to integer indices and vice-versa (cf. #1305). Handling of duplicate column names need to be improved (rebased on top of #1308).

@bkamins This touches areas you improved recently, and I'm not sure I'll be able to find the time to finish this right now. To avoid stepping on each other's toes (since this is required to implement the getindex improvements you've mentioned), if you want to take this up just let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions