Fast row aggregation in DataFrames.jl

As you might know `DataFrame` is optimised for column operations, and row operations are not efficient. There are some solutions for this and has been previously discussed (#2440, #2757,  #2439,  #952, ...).

Based on my knowledge, the most efficient way (similar-to-work-with-matrix performance) to do this when the problem is fitted into map and reduce, is using `mapreduce`,  e.g.

``` 
df = DataFrame(rand(10^5, 100), :auto)
op(x, y) = x .+= y
mapreduce(identity, op, eachcol(df), init = zeros(nrow(df)))
```

At the beginning I thought this is very trivial and just having some documentations about `mapreduce` should be enough for `DataFrames.jl` users. However, thinking about it for a while, I guess this is not that much trivial as I thought (particularly if `missings` are present). Thus, I think having a bunch of common row operations inside `DataFrames.jl` would be helpful, particularly, the operations which take care of `missing` automatically. Since I know this may be controversial, at the moment, I develop a package `DFRowOperation.jl` to define and store a set of common row operations. The users may use, contribute and evaluate this package and if it make sense, it would be great to add its functionality into `DataFrames.jl` in future.

you may access the package at

https://github.com/sl-solution/DFRowOperation.jl

and currently contains ~~the following functions~~ one function `byrow` with the support of the following optimised functionalities for row-wise operations:
```
- sum
- prod
- cumsum
- cumsum!
- cumprod
- cumprod!
- mean
- count
- any
- all
- minimum
- maximum
- var
- std
- stdze!
- stdze
- sort!
- sort
- nunique
- mapreduce
- reduce
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast row aggregation in DataFrames.jl #2768

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fast row aggregation in DataFrames.jl #2768

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions