-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace DvDateTimeZone, DvDateTime, DvTimeSpan with .NET standard types. #693
Conversation
@TomFinley @eerhardt Do we need missing value support for datetime, timespan, datetimeoffset? These are structs so may be use default as missing values? For now I have assumed there is no missing value indicator but I can add it back. #Resolved |
@@ -54,11 +54,11 @@ public CodecFactory(IHostEnvironment env, MemoryStreamPool memPool = null) | |||
RegisterSimpleCodec(new UnsafeTypeCodec<ulong>(this)); | |||
RegisterSimpleCodec(new UnsafeTypeCodec<Single>(this)); | |||
RegisterSimpleCodec(new UnsafeTypeCodec<Double>(this)); | |||
RegisterSimpleCodec(new UnsafeTypeCodec<DvTimeSpan>(this)); | |||
RegisterSimpleCodec(new UnsafeTypeCodec<TimeSpan>(this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnsafeTypeCodec [](start = 36, length = 15)
Is it still need to be UnsafeTypeCodec? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should it be? I have defined TimeSpanUnsafeTypeOps to handle this type.
In reply to: 211110052 [](ancestors = 211110052)
src/Microsoft.ML/Data/TextLoader.cs
Outdated
kind = DataKind.DT; | ||
else if (type == typeof(DvDateTimeZone) || type == typeof(TimeZoneInfo)) | ||
else if (type == typeof(DateTimeOffset) || type == typeof(TimeZoneInfo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeZoneInfo [](start = 70, length = 12)
In what way does a TimeZoneInfo
map into a DateTimeOffset
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a miss, I did intend to remove it.
In reply to: 211662608 [](ancestors = 211662608,211315779)
@@ -135,7 +135,7 @@ public bool IsTimeSpan | |||
} | |||
|
|||
/// <summary> | |||
/// Whether this type is a DvDateTime. | |||
/// Whether this type is a DateTime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DateTime [](start = 35, length = 8)
<see
tags perhaps #Resolved
I know you didn't write this code but this Maybe: Contracts.Assert((this == DateTimeType.Instance) == (this is DateTimeType));
return this is DateTimeType Similar cleanup possible for the other Refers to: src/Microsoft.ML.Core/Data/ColumnType.cs:148 in e0d66b0. [](commit_id = e0d66b0, deletion_comment = False) |
return true; | ||
} | ||
dst = DvDateTime.NA; | ||
|
||
return IsStdMissing(ref src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsStdMissing(ref src); [](start = 19, length = 22)
So I feel like if you parse missing into a value type that does not support missing, we might expect the result to be that we would throw, rather than just the default value. #Resolved
return true; | ||
} | ||
dst = DvDateTimeZone.NA; | ||
|
||
return IsStdMissing(ref src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsStdMissing [](start = 19, length = 12)
Similar for this. #Resolved
I might have expected to see some change where we take out |
{ | ||
private List<short> _offsets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure on how concerned we need to be with sizes, but an offset can only be in the range of -14
to 14
hours. So using a long
seems like a bit of an overkill.
It also appears that previously the _offsets
was the number of minutes in the offset. Now we are using the number of ticks in the offset. Do we need to be concerned with this difference? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.NET standard type has offsets in ticks. We could convert offset in minutes when writing to disk and convert it back to ticks when reading but we may lose precision. What do you think?
In reply to: 211667357 [](ancestors = 211667357)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DateTimeOffset
exposes its offset as a TimeSpan
, but internally it uses short
and in minutes.
From everything I can find online (ISO8601, RFC3339, SQL Server doc), the offset supports the range -14
to 14
hours, and only supports minute precision.
In reply to: 212488658 [](ancestors = 212488658,211667357)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! this is helpful. I'm now writing the ticks in minutes as short values because they are already validated when TimeSpan is created and when reading I convert the minutes back into Ticks when creating the TimeSpan object.
In reply to: 212670585 [](ancestors = 212670585,212488658,211667357)
@@ -906,11 +906,11 @@ protected Func<bool> GetColumnComparer(IRow r1, IRow r2, int col, ColumnType typ | |||
case DataKind.Bool: | |||
return GetComparerOne<DvBool>(r1, r2, col, (x, y) => x.Equals(y)); | |||
case DataKind.TimeSpan: | |||
return GetComparerOne<DvTimeSpan>(r1, r2, col, (x, y) => x.Equals(y)); | |||
return GetComparerOne<TimeSpan>(r1, r2, col, (x, y) => x.Ticks == y.Ticks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TimeSpan [](start = 42, length = 8)
As near as I can see, both TimeSpan
and DateTime
are directly comparable, that might be better. #Resolved
//https://github.com/dotnet/coreclr/blob/9499b08eefd895158c3f3c7834e185a73619128d/src/System.Private.CoreLib/shared/System/DateTimeOffset.cs#L51-L53 | ||
//https://github.com/dotnet/coreclr/blob/9499b08eefd895158c3f3c7834e185a73619128d/src/System.Private.CoreLib/shared/System/DateTimeOffset.cs#L286-L292 | ||
//From everything online(ISO8601, RFC3339, SQL Server doc, the offset supports the range -14 to 14 hours, and only supports minute precision. | ||
_offsets.Add((short)(value.Offset.Ticks / TimeSpan.TicksPerMinute)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this just be value.TotalMinutes
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't see totalMinutes under value, did you mean under offset?
In reply to: 213152272 [](ancestors = 213152272)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, value.Offset.TotalMinutes #Resolved
{ | ||
Contracts.Assert(!_disposed); | ||
value = new DvDateTimeZone(_ticks[_index], _offsets[_index]); | ||
value = new DateTimeOffset(new DateTime(_ticks[_index]), new TimeSpan(_offsets[_index] * TimeSpan.TicksPerMinute)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new TimeSpan(_offsets[_index] * TimeSpan.TicksPerMinute)
should be new TimeSpan(0, _offsets[_index], 0)
#Resolved
RegisterOtherCodec("VBuffer", GetVBufferCodec); | ||
RegisterOtherCodec("DvDateTimeZone", new DateTimeOffsetCodec(this).GetCodec); | ||
RegisterOtherCodec("DvDateTime", new DateTimeCodec(this).GetCodec); | ||
RegisterOtherCodec("DvTimeSpan", new UnsafeTypeCodec<TimeSpan>(this).GetCodec); | ||
RegisterOtherCodec("Key", GetKeyCodec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the new test helped show up some issues.
BTW where are we testing the writing of data into what amounts to a new format? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a test that reads from the old IDV format then it writes back as text and that is where writing of data into new format is tested. Also for date/time the serialization format has not changed. I believe it only changed for bools.
In reply to: 213349875 [](ancestors = 213349875)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a simple way to do this is modify your test a little bit.
Currently it is something like this.
TestCore("savedata", idvPath, "loader=binary", "saver=text", textOutputPath.Arg("dout"));
If we modify it to be this:
var intermediateData = /// some temp IDV file.
TestCore("savedata", idvPath, "loader=binary", "saver=text", textOutputPath.Arg("dout"));
TestCore("savedata", idvPath, "loader=binary", "saver=binary", intermediateData.ArgOnly("dout"));
TestCore("savedata", intermediateData, "loader=binary", "saver=text", textOutputPath.Arg("dout"));
That will test everything. #Resolved
…to dvdatetime # Conflicts: # test/Microsoft.ML.TestFramework/DataPipe/TestDataPipeBase.cs
…to dvdatetime # Conflicts: # test/BaselineOutput/SingleDebug/SavePipe/TestParquetPrimitiveDataTypes-Data.txt # test/BaselineOutput/SingleDebug/SavePipe/TestParquetPrimitiveDataTypes-Schema.txt # test/BaselineOutput/SingleRelease/SavePipe/TestParquetPrimitiveDataTypes-Data.txt # test/BaselineOutput/SingleRelease/SavePipe/TestParquetPrimitiveDataTypes-Schema.txt # test/data/Parquet/alltypes.parquet
This change was committed via #863 |
fixes #673