-
Notifications
You must be signed in to change notification settings - Fork 15
Closed
Labels
StandardizationStandardization Job affectedStandardization Job affectedbugSomething isn't workingSomething isn't workingpriority: mediumImportant but not urgentImportant but not urgent
Description
Describe the bug
Standardization losses milliseconds precision (despite the pattern supporting it) when to_timestamp is used (which is in all cases when source is not timestamp or date). It seesm to be a Spark issue.
Can be addressed with some smart parsing maybe, definitely with udf
To Reproduce
val pattern = "dd-MMM-yyyy HH:mm:ss.SSS"
val desiredSchema = StructType(Seq(
StructField("description", StringType, nullable = false),
StructField("timestamp_to_test", TimestampType, nullable = false,
new MetadataBuilder().putString("pattern", pattern).build)
))
val seq = Seq(
("U03", "21-MAR-2019 19:00:00.223"),
("L03", "21-MAR-2019 19:00:00.000"),
("M03", "21-MAR-2019 19:00:00.224"),
("L03", "21-MAR-2019 19:00:01.000")
)
val src = seq.toDF("description", "timestamp_to_test")
val std = StandardizationInterpreter.standardize(src, desiredSchema, "").cache()
.withColumn("comp", to_timestamp(lit("21-MAR-2019 19:00:00.224"), pattern) === col("timestamp_to_test"))
.withColumn("tz1", to_timestamp(lit("21-MAR-2019 19:00:00.224"), pattern))
.withColumn("tz2", lit(1547521021.83301).cast(TimestampType))
.withColumn("tz3", lit(1547521021.83301).cast(TimestampType).cast(StringType))
logDataFrameContent(std)
std.show(false)
std.printSchema()
Expected behaviour
Milliseconds are preserved
Hints for possible solutions
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
StandardizationStandardization Job affectedStandardization Job affectedbugSomething isn't workingSomething isn't workingpriority: mediumImportant but not urgentImportant but not urgent