Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DvText with .NET Standard type. #705

Closed
wants to merge 23 commits into from
Closed

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Aug 21, 2018

fixes #673

scan.Span = DvText.NA;
else if (_sb.Length == 0)
scan.Span = DvText.Empty;
if (scan.QuotingError || _sb.Length == 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan.QuotingError [](start = 28, length = 17)

I wonder if throwing an exception is more appropriate here

labelNames = new VBuffer<DvText>(2, new[] { new DvText("positive"), new DvText("negative") });
DvText[] names = new DvText[2];
labelNames = new VBuffer<ReadOnlyMemory<char>>(2, new[] { "positive".AsMemory(), "negative".AsMemory() });
ReadOnlyMemory<char>[] names = new ReadOnlyMemory<char>[2];
Copy link
Member Author

@codemzs codemzs Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadOnlyMemory[] names = new ReadOnlyMemory[2]; [](start = 12, length = 59)

indent #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Aug 22, 2018

Hi @codemzs! Do note that our usual practice for "personal" code reviews is that they happen in a personal fork. That is, there is nothing preventing one from requesting a "pull request" into master in ones own fork, where one can review the changes in convenience as much as they like. Indeed this is our usual practice. #Resolved

var stratVals = foldCol >= 0 ? new[] { DvText.NA, DvText.NA } : new[] { DvText.NA };
//REVIEW: Not sure if empty string makes sense here.

var stratVals = foldCol >= 0 ? new[] { "".AsMemory(),"".AsMemory() } : new[] { "".AsMemory() };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check on this.

@codemzs codemzs changed the title WIP DO NOT REVIEW - Replace DvText with .NET Standard type. WIP Replace DvText with .NET Standard type. Aug 23, 2018
@codemzs codemzs changed the title WIP Replace DvText with .NET Standard type. RIP Replace DvText with .NET Standard type. Aug 23, 2018
/// </summary>
public static string GetRawUnderlyingBufferInfo(out int ichMin, out int ichLim, ReadOnlyMemory<char> memory)
{
MemoryMarshal.TryGetString(memory, out string outerBuffer, out ichMin, out int length);
Copy link
Contributor

@TomFinley TomFinley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My attitude towards this is either we support ReadOnlyMemory<char> or we don't. If we do, then our methods on top of it should work, even if this fails. Fortunately it seems like this was only introduced because we wanted to just have a light "shim" on top of existing DvText methods, which is not really something we want to do. #Resolved

/// </summary>
public static bool Equals(ReadOnlyMemory<char> b, ReadOnlyMemory<char> memory)
{
if (memory.Length != b.Length)
Copy link
Contributor

@TomFinley TomFinley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method and the one below could probably efficiently enough be done over direct indices over the ReadOnlyMemory<char> structure, at least, so I would hope. This would simplify the method considerably. #Resolved

}

/// <summary>
/// Does not propagate NA values. Returns true if both are NA (same as a.Equals(b)).
Copy link
Contributor

@TomFinley TomFinley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Returns true if both are NA" suggests this is copy-pasted from somewhere, since of course now we will no longer have the notion of an NA string. This suggests an incomplete conversion, and is an opportunity for code simplification that we should do right now. #Resolved

/// Returns a text span with leading and trailing spaces trimmed. Note that this
/// will remove only spaces, not any form of whitespace.
/// </summary>
public static ReadOnlyMemory<char> Trim(ReadOnlyMemory<char> memory)
Copy link
Contributor

@TomFinley TomFinley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two major motivations for moving from DvText to ROM<char> is that we get to avoid declaring our own special type for text, and also get to exploit the functionality that the .NET framework gives us. It seems like we did the first, and not the second: we have implementations of some things that appear to have relatively straightforward close implementations in System.MemoryExtensions. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These extension methods are part of ReadOnlySpan and they return ROS as well which you cannot assign to ROM or create a ROM out of it and that effectively makes them useless for someone that is working with ROM. I'm starting a thread with the guys that wrote these methods to make sure I'm not missing anything.


In reply to: 212748423 [](ancestors = 212748423)

/// <summary>
/// This produces zero for an empty string.
/// </summary>
public static bool TryParse(out Single value, ReadOnlyMemory<char> memory)
Copy link
Contributor

@TomFinley TomFinley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, now that .NET has it, is float.TryParse(ReadOnlyMemory<char>, ...) unusable? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is, you can also get rid of DoubleParser.


In reply to: 212748642 [](ancestors = 212748642)

Copy link
Member Author

@codemzs codemzs Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well. Unfortunately, these methods are only available in .netstandard 2.1, not in 2.0 and our product code targets 2.0. We cannot target 2.1 because then product code won't run on desktop framework anymore because these methods are not available there.

reference: https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder.append?view=netcore-2.1

https://docs.microsoft.com/en-us/dotnet/api/system.text.stringbuilder.append?view=netframework-4.7.2

It would have been really nice if they were though...


In reply to: 212759741 [](ancestors = 212759741,212748642)

@TomFinley
Copy link
Contributor

TomFinley commented Aug 24, 2018

    private bool TryParseCore(string text, int ich, int lim, out ulong dst)

Certainly ulong.TryParse(ReadOnlyMemory<char>, ...) exists now. So can we get rid of this? #Resolved


Refers to: src/Microsoft.ML.Data/Data/Conversion.cs:1265 in ef32b4a. [](commit_id = ef32b4a, deletion_comment = False)

@TomFinley
Copy link
Contributor

    private bool TryParseCore(string text, int ich, int lim, out ulong dst)

Same for all TryParse methods.


In reply to: 415887609 [](ancestors = 415887609)


Refers to: src/Microsoft.ML.Data/Data/Conversion.cs:1265 in ef32b4a. [](commit_id = ef32b4a, deletion_comment = False)

public static NormStr FindInPool(NormStr.Pool pool, ReadOnlyMemory<char> memory)
{
Contracts.CheckValue(pool, nameof(pool));
MemoryMarshal.TryGetString(memory, out string outerBuffer, out int ichMin, out int length);
Copy link
Contributor

@TomFinley TomFinley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before. You're either supporting ReadOnlyMemor<char> or you're not. Also of course as with every other usage of this function we are ignoring whether this succeeded or not. #Resolved

@TomFinley
Copy link
Contributor

    private bool TryParseCore(string text, int ich, int lim, out ulong dst)

You will of course have to handle mapping the empty string to 0, which things like double.TryParse and the like of course will not do, but that still represents a sizable improvement.


In reply to: 415887757 [](ancestors = 415887757,415887609)


Refers to: src/Microsoft.ML.Data/Data/Conversion.cs:1265 in ef32b4a. [](commit_id = ef32b4a, deletion_comment = False)

@codemzs codemzs changed the title RIP Replace DvText with .NET Standard type. Replace DvText with .NET Standard type. Aug 26, 2018
@codemzs
Copy link
Member Author

codemzs commented Sep 4, 2018

    private bool TryParseCore(string text, int ich, int lim, out ulong dst)

Please see my previous reply on the other comment similar to this.


In reply to: 415892086 [](ancestors = 415892086,415887757,415887609)


Refers to: src/Microsoft.ML.Data/Data/Conversion.cs:1265 in ef32b4a. [](commit_id = ef32b4a, deletion_comment = False)

…to dvtext

# Conflicts:
#	src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs
#	src/Microsoft.ML.Data/Evaluators/EvaluatorUtils.cs
#	src/Microsoft.ML.Data/Transforms/TermTransform.cs
#	src/Microsoft.ML.Data/Transforms/TermTransformImpl.cs
#	src/Microsoft.ML.ImageAnalytics/ImageLoaderTransform.cs
#	test/Microsoft.ML.TestFramework/DataPipe/TestDataPipeBase.cs
@@ -237,7 +237,7 @@ public class QuoteInput
public float ID;

[Column("1")]
public string Text;
public ReadOnlyMemory<char> Text;
Copy link
Member

@eerhardt eerhardt Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still support regular string types, right? I think that would be bad if we didn't. #Resolved


[Column("1")]
public float Number_1;
public ReadOnlyMemory<char> Number_1;
Copy link
Member

@eerhardt eerhardt Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong - it was float before, and the property is called Number_1. #Resolved

@@ -33,6 +33,7 @@
</ItemGroup>

<ItemGroup>
<PackageReference Include="System.Memory" Version="4.5.1" />
Copy link
Member

@eerhardt eerhardt Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. All our test projects are netcoreapp2.1, and so Span and ReadOnlyMemory should be available by default. #Resolved

@@ -18,4 +18,8 @@
<NativeAssemblyReference Include="LdaNative" />
<NativeAssemblyReference Include="FastTreeNative" />
</ItemGroup>

<ItemGroup>
<PackageReference Include="System.Memory" Version="4.5.1" />
Copy link
Member

@eerhardt eerhardt Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary. All our test projects are netcoreapp2.1, and so Span and ReadOnlyMemory should be available by default. #Resolved

public float? fFloat;
public double? fDouble;
public bool? fBool;
public string fString;
Copy link
Member

@eerhardt eerhardt Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still support string types, and as such, string can be null. So we should have some tests for null strings. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will null string map to? empty string?


In reply to: 215970412 [](ancestors = 215970412)

@codemzs codemzs closed this Sep 20, 2018
@codemzs codemzs deleted the dvtext branch September 20, 2018 18:15
@codemzs
Copy link
Member Author

codemzs commented Sep 20, 2018

This change was committed via #863

@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.NET data type system instead of DvTypes
3 participants