Description
Continuation of #558 which fixed the most annoying bugs related to describe
.
See #558 for more information.
Our statistics functions need some more love. We used to have many missing types (mostly fixed by #937), but there are yet some more inconsistencies to be solved:
As mentioned here #543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return
Double
and then handleBigInteger
/BigDecimal
separately for now, as they're java-specific for now.
There are plenty of public overloads onIterable
andSequence
. It's fine to have them internally, but I feel like we're clogging the public scope here. mean, for instance, is already covered in the stdlib.
We'll need to hide public functions that are not on DataColumn as @AndreiKingsley will probably make a statistics library for that anyway.
We need to honor some conversion table (see below)
We won't support UByte
, UShort
, UInt
, and ULong
since they don't inherit Number
.
We also drop support for BigNumber
and BigDecimal
as this makes generic typing and conversion very difficult and unpredictable.
Progress:
- underlying fixes Aggregator implementation rework #1078
- mean Mean statistics fixes #1091
- sum Sum statistics and aggregator improvements #1103
- min
Aggregator
dependency injection,min
/max
, andskipNaN
#1108 - max
Aggregator
dependency injection,min
/max
, andskipNaN
#1108 - std Overhaul for std #1119
- median Median overhaul #1122
- percentile Percentile #1149
- cumSum CumSum #1152
Function | Conversion | extra information | nulls in input |
---|---|---|---|
mean | Int -> Double | For all: Double.NaN if no elements | All nulls are filtered out |
Short -> Double | |||
Byte -> Double | |||
Long -> Double | |||
Double -> Double | skipNaN option, false by default | ||
Float -> Double | skipNaN option, false by default | ||
Number -> Conversion(Common number type) -> Double | skipNaN option, false by default | ||
Nothing / no values -> Double.NaN | |||
sum | Int -> Int | All default to zero if no values | All nulls are filtered out |
Short -> Int | |||
Byte -> Int | |||
Long -> Long | |||
Double -> Double | skipNaN option, false by default | ||
Float -> Float | skipNaN option, false by default | ||
Number -> Conversion(Common number type) -> Number | skipNaN option, false by default | ||
Nothing / no values -> Double (0.0) | |||
cumSum | Int -> Int | All default to zero if no values | All can optionally skip nulls in input with skipNull option, true by default |
Short -> Int | important because order matters with cumSum | ||
Byte -> Int | |||
Long -> Long | |||
Double -> Double | skipNaN option, true by default | ||
Float -> Float | skipNaN option, true by default | ||
Number -> Conversion(Common number type) -> Number | skipNaN option, true by default | ||
Nothing / no values -> Double (0.0) | |||
min/max | T -> T? where T : Comparable<T> | For all: null if no elements, has -OrNull overloads | All nulls are filtered out |
Int -> Int? | |||
Short -> Short? | |||
Byte -> Byte? | |||
Long -> Long? | |||
Double -> Double? | skipNaN option, false by default, returns NaN when in the input | ||
Float -> Float? | skipNaN option, false by default, returns NaN when in the input | ||
Would need more overloads and more work | |||
Nothing / no values -> Nothing? (null) | |||
median/percentile | T -> T? where T : Comparable<T> | For all: median of even list will cause conversion to Double if possible, else lower middle | All nulls are filtered out |
Int -> Double? | null if no elements | ||
Short -> Double? | |||
Byte -> Double? | |||
Long -> Double? | |||
Double -> Double? | |||
Float -> Double? | |||
Would need more overloads and more work | |||
Nothing / no values -> Nothing? (null) | |||
std | Int -> Double | All have DDoF (Delta Degrees of Freedom) argument | All nulls are filtered out |
Short -> Double | and Double.NaN if no elements | ||
Byte -> Double | |||
Long -> Double | |||
Double -> Double | skipNaN option, false by default | ||
Float -> Double | skipNaN option, false by default | ||
Number -> Conversion(Common number type) -> Double | skipNaN option, false by default | ||
Nothing / no values -> Double.NaN | |||
var (want to add?) | same as std |