Skip to content

Standardization gobbles milliseconds from timestamp #465

@benedeki

Description

@benedeki

Describe the bug

Standardization losses milliseconds precision (despite the pattern supporting it) when to_timestamp is used (which is in all cases when source is not timestamp or date). It seesm to be a Spark issue.
Can be addressed with some smart parsing maybe, definitely with udf

To Reproduce

    val pattern = "dd-MMM-yyyy HH:mm:ss.SSS"
    val desiredSchema = StructType(Seq(
      StructField("description", StringType, nullable = false),
      StructField("timestamp_to_test", TimestampType, nullable = false,
        new MetadataBuilder().putString("pattern", pattern).build)
    ))

    val seq = Seq(
      ("U03", "21-MAR-2019 19:00:00.223"),
      ("L03", "21-MAR-2019 19:00:00.000"),
      ("M03", "21-MAR-2019 19:00:00.224"),
      ("L03", "21-MAR-2019 19:00:01.000")
    )
    val src = seq.toDF("description", "timestamp_to_test")

    val std = StandardizationInterpreter.standardize(src, desiredSchema, "").cache()
      .withColumn("comp", to_timestamp(lit("21-MAR-2019 19:00:00.224"), pattern) === col("timestamp_to_test"))
      .withColumn("tz1", to_timestamp(lit("21-MAR-2019 19:00:00.224"), pattern))
      .withColumn("tz2", lit(1547521021.83301).cast(TimestampType))
      .withColumn("tz3", lit(1547521021.83301).cast(TimestampType).cast(StringType))

    logDataFrameContent(std)
    std.show(false)
    std.printSchema()

Expected behaviour

Milliseconds are preserved
Hints for possible solutions

Metadata

Metadata

Assignees

Labels

StandardizationStandardization Job affectedbugSomething isn't workingpriority: mediumImportant but not urgent

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions