Closed
Description
Umbrella'd under #827
Reported on slack: https://kotlinlang.slack.com/archives/C4W52CFEZ/p1728885330465379
The CSV https://kotlinlang.slack.com/files/U16CM33AB/F07R98VJ7AT/msleep.csv contains several columns of Double values and "NA"s, representing null
. This causes some curious cases:
Expected | Actual |
---|---|
DataFrame.readCSV() should be able to recognise "NA" means null and parse the column as Double? |
The column brainwt is parsed as BigDecimal because it doesn't recognize "3e-04" as Double and doesn't handle NA well. |
DataFrame.readCSV("NA" in nullStrings) should help recognizing "NA" as null . |
Recognizes "NA" as null but result is still BigDecimal? |
"NA" in nullStrings and colTypes = "brainwt" to ColType.Double should work for sure |
"java.lang.IllegalStateException: Couldn't parse 'NA' into type kotlin.Double" . Apparently giving a colType grabs the Double parser directly and does not take nullStrings into account. Plus, if the result is null it's assumed the parsing failed. We need to give ColType.String and call parse or convert afterwards manually. |
parse() and convert().toDouble() should behave the same |
parse() uses NumberFormat with locale and doesn't recognize "3e-04" . convert using Double.parseDouble() without locale and can parse it. |
Most of the issues here are solved by the new CSV implementation under the umbrella issue: #827. The case for "3e-04" requires a different Double parser, which is solved by #935.