Skip to content

Reading CSV with custom nullString impossible #921

Closed
@Jolanrensen

Description

@Jolanrensen

Umbrella'd under #827

Reported on slack: https://kotlinlang.slack.com/archives/C4W52CFEZ/p1728885330465379

The CSV https://kotlinlang.slack.com/files/U16CM33AB/F07R98VJ7AT/msleep.csv contains several columns of Double values and "NA"s, representing null. This causes some curious cases:

Expected Actual
DataFrame.readCSV() should be able to recognise "NA" means null and parse the column as Double? The column brainwt is parsed as BigDecimal because it doesn't recognize "3e-04" as Double and doesn't handle NA well.
DataFrame.readCSV("NA" in nullStrings) should help recognizing "NA" as null. Recognizes "NA" as null but result is still BigDecimal?
"NA" in nullStrings and colTypes = "brainwt" to ColType.Double should work for sure "java.lang.IllegalStateException: Couldn't parse 'NA' into type kotlin.Double". Apparently giving a colType grabs the Double parser directly and does not take nullStrings into account. Plus, if the result is null it's assumed the parsing failed. We need to give ColType.String and call parse or convert afterwards manually.
parse() and convert().toDouble() should behave the same parse() uses NumberFormat with locale and doesn't recognize "3e-04" . convert using Double.parseDouble() without locale and can parse it.

Most of the issues here are solved by the new CSV implementation under the umbrella issue: #827. The case for "3e-04" requires a different Double parser, which is solved by #935.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcsvCSV / delim related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions