Skip to content

☂ Statistics streamlining #961

Open
@Jolanrensen

Description

@Jolanrensen

Continuation of #558 which fixed the most annoying bugs related to describe.

See #558 for more information.

Our statistics functions need some more love. We used to have many missing types (mostly fixed by #937), but there are yet some more inconsistencies to be solved:

As mentioned here #543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return Double and then handle BigInteger / BigDecimal separately for now, as they're java-specific for now.

There are plenty of public overloads on Iterable and Sequence. It's fine to have them internally, but I feel like we're clogging the public scope here. mean, for instance, is already covered in the stdlib.

We'll need to hide public functions that are not on DataColumn as @AndreiKingsley will probably make a statistics library for that anyway.

We need to honor some conversion table (see below)

We won't support UByte, UShort, UInt, and ULong since they don't inherit Number.

We also drop support for BigNumber and BigDecimal as this makes generic typing and conversion very difficult and unpredictable.

Progress:

Function Conversion extra information nulls in input
mean Int -> Double For all: Double.NaN if no elements All nulls are filtered out
Short -> Double
Byte -> Double
Long -> Double
Double -> Double skipNaN option, false by default
Float -> Double skipNaN option, false by default
Number -> Conversion(Common number type) -> Double skipNaN option, false by default
Nothing / no values -> Double.NaN
sum Int -> Int All default to zero if no values All nulls are filtered out
Short -> Int
Byte -> Int
Long -> Long
Double -> Double skipNaN option, false by default
Float -> Float skipNaN option, false by default
Number -> Conversion(Common number type) -> Number skipNaN option, false by default
Nothing / no values -> Double (0.0)
cumSum Int -> Int All default to zero if no values All can optionally skip nulls in input with skipNull option, true by default
Short -> Int important because order matters with cumSum
Byte -> Int
Long -> Long
Double -> Double skipNaN option, true by default
Float -> Float skipNaN option, true by default
Number -> Conversion(Common number type) -> Number skipNaN option, true by default
Nothing / no values -> Double (0.0)
min/max T -> T? where T : Comparable<T> For all: null if no elements, has -OrNull overloads All nulls are filtered out
Int -> Int?
Short -> Short?
Byte -> Byte?
Long -> Long?
Double -> Double? skipNaN option, false by default, returns NaN when in the input
Float -> Float? skipNaN option, false by default, returns NaN when in the input
Number -> Number? Would need more overloads and more work
Nothing / no values -> Nothing? (null)
median/percentile T -> T? where T : Comparable<T> For all: median of even list will cause conversion to Double if possible, else lower middle All nulls are filtered out
Int -> Double? null if no elements
Short -> Double?
Byte -> Double?
Long -> Double?
Double -> Double?
Float -> Double?
Number -> Conversion(Common number type) -> Double Would need more overloads and more work
Nothing / no values -> Nothing? (null)
std Int -> Double All have DDoF (Delta Degrees of Freedom) argument All nulls are filtered out
Short -> Double and Double.NaN if no elements
Byte -> Double
Long -> Double
Double -> Double skipNaN option, false by default
Float -> Double skipNaN option, false by default
Number -> Conversion(Common number type) -> Double skipNaN option, false by default
Nothing / no values -> Double.NaN
var (want to add?) same as std

Metadata

Metadata

Assignees

Labels

bugSomething isn't working☂ umbrella issueLabel assigned to issues that are collections of smaller issues

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions