Base DataFrame on NamedTuple

Now that `NamedTuple` is in Julia Base (on 0.7), we should use it in DataFrames. A possible approach which has been discussed on Slack would be to define a new `TypedDataFrame{C<:NamedTuple} <: AbstractDataFrame` type, which would just wrap a `NamedTuple`, and include information regarding the column names and types in its type parameter `C`. Then `DataFrame` would be a convenience type wrapping a `TypedDataFrame` object, with the advantages that 1) columns can be added/removed/renamed, and 2) functions do not need to be compiled for each combination of column types when it doesn't improve performance. `TypedDataFrame` would be useful to improve performance where the user provides a custom function operating on each row (notably `groupby`), as specialization is essential for performance in that case.

I have experimented this approach in the [nl/typed](https://github.com/JuliaData/DataFrames.jl/compare/nl/typed) branch, using the NamedTuples.jl package (since I used Julia 0.6 at first). It's mostly working (and passes some tests), but lots of fixes are needed to pass all tests. In particular, I originally removed the `Index` type since I thought `NamedTuple` could replace it (hence the `sym_colinds` and `int_colinds` attempts), but I'm not sure that's possible since we sometimes need to convert symbols to integer indices and vice-versa (cf. #1305). Handling of duplicate column names need to be improved (rebased on top of #1308).

@bkamins This touches areas you improved recently, and I'm not sure I'll be able to find the time to finish this right now. To avoid stepping on each other's toes (since this is required to implement the `getindex` improvements you've mentioned), if you want to take this up just let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Base DataFrame on NamedTuple #1335

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Base DataFrame on NamedTuple #1335

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions