Skip to content

Describe breaks on Number column #558

Closed
@Jolanrensen

Description

@Jolanrensen

Describe breaks on Number columns. This happens because the Iterable<Number>.std() function accepts Number but doesn't convert them to Double (like mean() does).

There are a couple more missing actually:

  • cumSum

    • Misses Byte, Short
    • Has DataColumn overloads but not Iterable/Sequence
  • mean

    • Has Sequence<Double | Float> but not for other Number types
  • median

    • Misses Float, Byte, Short, Number (it only works on Comparable)
    • Needs to handle other types consistently
    • No Sequence overloads
    • Cannot skipNA (if applicable)
  • min and max

    • internal Iterable<T>.min and max are not used and can be removed. Stdlib functions for Comparable sequences and iterables are used instead.
    • Misses Number (it only works on Comparable)
    • Short and Byte are converted to Int for some reason
  • std

    • Breaks if type is Number
    • Short and Byte are cast to Int which works but is a bit iffy
    • Iterable overloads missing for Number, Short, Byte
    • Sequence overloads missing
    • Nullable overloads missing for Iterable (and sequence)
  • varianceAndMean

    • also provides std(ddof: Int) function without docs of what ddof even means, as well as count. Could have a better name. Also can produce nulls?? this screams for documentation.
    • variance functions are missing on DataColumns entirely (had to be added separately for Kandy)
    • Misses Short, Byte, Number, and nullable overloads
    • Misses Sequence overloads
  • sum

    • Has TODOs where types are amiss
    • Misses Float(!), Short, Byte, Number in various Iterable overloads.
  • All are also missing BigInteger as we're supporting BigDecimal too.

  • There are plenty of public overloads on Iterable and Sequence. It's fine to have them internally, but I feel like we're clogging the public scope here. mean, for instance, is already covered in the stdlib.

  • We need to honor some conversion table (see below)

  • Describe now only shows min, median, and max for <T : Comparable<T>> columns, so not Number. This makes sense, but not from a user-perspective. We can just convert to Double first, then calculate it.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions