Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning with counts (Dracula) transformer #4514

Merged
merged 27 commits into from
Jun 4, 2020
Merged

Conversation

yaeldekel
Copy link

Fixes #4016.

@yaeldekel yaeldekel requested a review from a team as a code owner December 2, 2019 09:53
@@ -188,9 +187,9 @@ private static VersionInfo GetVersionInfo()
IDataView input,
string name,
string source = null,
bool join = Defaults.Join,
bool join = Defaults.Combine,
Copy link
Member

@eerhardt eerhardt Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool join => bool combine. And update the xml docs #Resolved

Copy link
Member

@eerhardt eerhardt Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am rethinking this comment. What's weird to me is that this is the "hash joining transform", and we are calling this behavior "combine". That seems inconsistent. We should use the same term to mean the same thing. Either always use "join" or always use "combine". #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking that the name HashJoiningTransform is not very informative. What do you think about VectorHashingTransform?


In reply to: 356308668 [](ancestors = 356308668)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectoHashingTransform implies each element is going to be hashed separately (which is the default behavior). With respect to @eerhardt's comment that we should user either Join or Combine consistently, a better name might be HashCombiningTransform.


In reply to: 356607566 [](ancestors = 356607566,356308668)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted the changes in this file, since it is no longer being used in this PR. This transform is now not being used directly anywhere in the code, and just being kept around for backward compatibility of maml, and for its entry point.


In reply to: 427561266 [](ancestors = 427561266,356607566,356308668)

}
}

public static class Dracula
Copy link
Member

@eerhardt eerhardt Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this to be public, it only has an internal method. #Resolved

}
}

public static class CountTable
Copy link
Member

@eerhardt eerhardt Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be internal #Resolved

/// <summary>
/// Signature for CountTableBuilder.
/// </summary>
public delegate void SignatureCountTableBuilder();
Copy link
Member

@eerhardt eerhardt Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? #Resolved

@codemzs
Copy link
Member

codemzs commented Dec 2, 2019

@yaeldekel Shall we give it a better name as opposed to calling it "Dracula"? Seems a little informal to me. #Resolved

private static VersionInfo GetVersionInfo()
{
return new VersionInfo(
modelSignature: "CM CT",
Copy link
Member

@codemzs codemzs Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"CM CT", [](start = 32, length = 11)

what kind of model signature is this? Why is there a tab in between? #Resolved


namespace Microsoft.ML.Transforms
{
public class DraculaEstimator : IEstimator<DraculaTransformer>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving @codemzs' comment to a thread:

@yaeldekel Shall we give it a better name as opposed to calling it "Dracula"? Seems a little informal to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Dracula is not used, CountTargetEncoder might be suitable.

Our old blog post lists DRACULA as "Distributed Robust Algorithm for Count-based Learning": https://blogs.technet.microsoft.com/machinelearning/2015/02/17/big-learning-made-easy-with-counts/

As an aside, there's also public slides:
https://www.slideshare.net/SessionsEvents/misha-bilenko-principal-researcher-microsoft

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Changing the name to CountTargetEncoder, unless someone has a different idea?

@codecov
Copy link

codecov bot commented Dec 3, 2019

Codecov Report

Merging #4514 into master will decrease coverage by 2.13%.
The diff coverage is 93.99%.

@@            Coverage Diff             @@
##           master    #4514      +/-   ##
==========================================
- Coverage   75.65%   73.52%   -2.14%     
==========================================
  Files         990     1016      +26     
  Lines      179573   190109   +10536     
  Branches    19308    20467    +1159     
==========================================
+ Hits       135853   139774    +3921     
- Misses      38453    44768    +6315     
- Partials     5267     5567     +300     
Flag Coverage Δ
#Debug 73.52% <93.99%> (-2.14%) ⬇️
#production 69.27% <93.02%> (-2.29%) ⬇️
#test 87.61% <100.00%> (-1.29%) ⬇️
Impacted Files Coverage Δ
src/Microsoft.ML.Core/Data/IEstimator.cs 84.53% <ø> (ø)
test/Microsoft.ML.TestFramework/TestCommandBase.cs 44.56% <ø> (+0.40%) ⬆️
src/Microsoft.ML.Transforms/Dracula/CountTable.cs 86.76% <86.76%> (ø)
...Microsoft.ML.Transforms/Dracula/MultiCountTable.cs 88.51% <88.51%> (ø)
...oft.ML.Transforms/Dracula/CountTableTransformer.cs 90.08% <90.08%> (ø)
...crosoft.ML.Transforms/Dracula/CountTableBuilder.cs 93.75% <93.75%> (ø)
...ansforms/Dracula/CountTargetEncodingTransformer.cs 93.75% <93.75%> (ø)
src/Microsoft.ML.Transforms/Dracula/Featurizer.cs 96.66% <96.66%> (ø)
...rc/Microsoft.ML.Transforms/Dracula/CMCountTable.cs 98.82% <98.82%> (ø)
...c/Microsoft.ML.Data/DataLoadSave/EstimatorChain.cs 89.65% <100.00%> (+0.36%) ⬆️
... and 235 more

@sharwell
Copy link
Member

sharwell commented Dec 10, 2019

📝 The OutOfMemoryException test failures appear to be true failures #Resolved


namespace Microsoft.ML.Transforms
{
public sealed class CountTableEstimator : IEstimator<CountTableTransformer>
Copy link
Member

@eerhardt eerhardt Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this class need to be public? How can an external user use this class? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, xml comments on all public surface area.


In reply to: 356307263 [](ancestors = 356307263)


// Assumes features are filled with the respective counts.
// Adds laplacian noise if set and returns the sum of the counts.
private float AddLaplacianNoisePerLabel(int iCol, Random rand, Span<float> counts)
Copy link
Contributor

@justinormont justinormont Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the noise still added once the ML model is fully created?

@davidbrownellWork was mentioning that there is added noise at the prediction time. This complicates the ONNX conversion. The AutoML team is looking to use Dracula for its target encoding and we need it in ML.NET, NimbusML, and all the way to ONNX. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The noise is only used at prediction time. From your experience using this transform, do you think it is needed?


In reply to: 356322013 [](ancestors = 356322013)

Copy link
Author

@yaeldekel yaeldekel Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the EstimatorChain class, and it seems to me that it would be possible to add a method similar to AppendCacheCheckPointthat would add noise to a requested column. Here is the code that is currently in the Fit method of EstimatorChain:

            IDataView current = input;
            var xfs = new ITransformer[_estimators.Length];
            for (int i = 0; i < _estimators.Length; i++)
            {
                var est = _estimators[i];
                xfs[i] = est.Fit(current);
                current = xfs[i].Transform(current);
                if (_needCacheAfter[i] && i < _estimators.Length - 1)
                {
                    Contracts.AssertValue(_host);
                    current = new CacheDataView(_host, current, null);
                }
            }

If we add, similarly to how we have current = new CacheDataView(...), something like current = new NoiseDataView(...), then we can let the user write something like

var pipeline = mlContext.Transforms.CountTargetEncoding()
    .AppendNoise("Features").Append(mlContext.Binary.Sdca());

then the SDCA component in the pipeline will see the Features column with noise added to it when training, however, the resulting TransformerChain will not have noise as part of the model.

Together with good samples and documentation emphasizing the recommended way to use the CountTargetEncodingEstimator (which is either to train the transformer in a separate pipeline than the trainer and on different data, or use the AppendNoise API), I think that may be a better solution to having a noise option as part of the estimator constructor.

Please let me know what you think of this solution and whether I should go ahead and implement that.


In reply to: 357085107 [](ancestors = 357085107,356322013)

Copy link
Contributor

@gvashishtha gvashishtha Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yael, what is the name of this transform in TLC? I went to our TLC telemetry to look for internal users of the dracula transform but I did not see any usage for a transformer named "Dracula" #Resolved

Copy link
Member

@antoniovs1029 antoniovs1029 Jan 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvashishtha In the TLC GUI it's called "Dracula Transform" (under "Featurizers: Supervised") and in the TLC code it is called DraculaTransform… but I wouldn't know if other names are used in the telemetry. #Resolved

Copy link
Contributor

@justinormont justinormont Jan 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, the log-odds are returned approximately as:

CountPositve = CountPositive + LaplacianNoise(); // Add noise to pos count
CountNegative = CountNegative + LaplacianNoise();  // Add noise to neg count
logOdds = ln( (CountPositive + PriorCount) / (CountPositive + CountNegative + 2*PriorCount) );  // Includes 2x noise, plus prior as smoothing

I don't think we'll be able to just add noise outside of the Dracula transform and end up with the same result.

My recommendation is keeping the noise computation as is, and disable noise for the ONNX unit tests, or add a window (e.g. +/- 10%) of acceptable values when checking for (near) equality of the outputs. #Resolved

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two possible solutions as I see it are:

  • Leaving the computation as it is right now.
    • Pros: easy solution for fitting a pipeline that has a trainer after the Dracula transform without having to split into two pipelines and training each part on a different dataset.
    • Cons: if the ONNX export of this transform will not include the noise part (which it likely won't), then this will result in a model that gives different results in the ML.NET format and in the ONNX format.
  • Making the transformer apply the noise, but not serialize the noise parameter, i.e. after deserialization it won't add noise.
    • Pros: consistency with the ONNX version of the model. Also, the noise is really not needed at prediction time, it is only needed to train the next component in the pipeline.
    • Cons: if you call Transform on the pipeline with the same data right after it is trained, and after serializing and deserializing the model, it will give different results.

I don't particularly like either of these solutions, I think they are both problematic, but currently I can't think of a solution that will not have any of the cons listed above... Which solution do you think I should go with?

cc @justinormont, @eerhardt


In reply to: 365470116 [](ancestors = 365470116)

Copy link

@davidbrownellWork davidbrownellWork Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of noise is a transformation step that is only used when the featurizer is included in a pipeline that is used during training time. Because ONNX exports are only used for inference, it seems OK that the export doesn't include noise, as the usage in ONNX will be the same as when this featurizer is used in an inferencing ML.Net pipeline.

It seems that the first point is the solution, as long as the code is written such that noise is not introduced when the featurizer is part of an inferencing pipeline. #Resolved

}
}

public sealed class CountTargetEncodingTransformer : ITransformer
Copy link
Member

@codemzs codemzs Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountTargetEncodingTransformer [](start = 24, length = 30)

documentation? #Resolved


namespace Microsoft.ML.Transforms
{
public class CountTargetEncodingEstimator : IEstimator<CountTargetEncodingTransformer>
Copy link
Member

@codemzs codemzs Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CountTargetEncodingEstimator [](start = 17, length = 28)

Documentation? #Resolved

@@ -0,0 +1,553 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
Copy link
Member

@codemzs codemzs Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the final to more relevant name. If we are calling it CountTragetEncoding then we should not be seeing Dracula in the codebase. #Resolved

@harishsk
Copy link
Contributor

harishsk commented Feb 4, 2020

{

This class doesn't follow the typical pattern of using a Mapper. Any particular reason why?
#Resolved


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:35 in 91887da. [](commit_id = 91887da, deletion_comment = False)

@harishsk
Copy link
Contributor

harishsk commented Feb 4, 2020

{

How is this different from HashingEstimator? #Resolved


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:35 in 91887da. [](commit_id = 91887da, deletion_comment = False)

@harishsk
Copy link
Contributor

harishsk commented Feb 4, 2020

            };

Internally this too uses the MurmurHash which is used by HashingTransformer. Is it possible to rationalize this transform the HashingTransformer? #Resolved


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:647 in 91887da. [](commit_id = 91887da, deletion_comment = False)

@harishsk
Copy link
Contributor

harishsk commented Feb 4, 2020

                return Hashing.MurmurHash(seed, sb, 0, sb.Length);

We have had a number of backward compatibility issues to address with MurmurHash when trying to export to onnx. Can we defer making this public until we have implemented the onnx export functionality for this transformer? #Resolved


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:646 in 91887da. [](commit_id = 91887da, deletion_comment = False)

@yaeldekel
Copy link
Author

{

This class was never converted to IEstimator/ITransformer because it was never a priority. It is currently only used in the entry points for train-test and cv split. The cv command and the public train-test and cv APIs use all use the HashingEstimator.
There are a couple of differences between HashJoiningTransform and HashingEstimator:

  • HashJoiningTransform hashes floats/doubles using murmur hash, and the rest of the types by converting the value to string, and then hashing. HashingEstimator has a hash function the simple data view types: floats, doubles, all the integer types, all the unsigned integer types, all the key types, bool and text. I think that it was implemented that way for simplicity, not because of some kind of advantage of hashing strings over other types.
  • HashJoiningTransform has an option to hash a vector of values into a single hash value, as opposed to the HashingEstimator, that always handles vectors by hashing each slot separately and producing a vector of hashes. This is an advantage that is useful for cv and train-test splits, since these splits apply a RangeFilter on top of the output of HashingTransformer, which would cause an exception if given a vector column.
  • HashJoiningTransform has an option to specify subsets of the vector input that should be hashed together. For example, the user can specity that they wish to hash slots 1,3,5 into a single hash value, and slots 0,2,4 into a single hash value (this option is where the name of the transform came from :-)). I think that this capability was added specifically for the dracula transform, although I don't know how useful it is (@justinormont, can you say anything about that?).

In reply to: 582084074 [](ancestors = 582084074)


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:35 in c3df0e0. [](commit_id = c3df0e0, deletion_comment = False)

@yaeldekel
Copy link
Author

            };

I think that if we add the option to hash a vector into a single hash to HashingTransformer then we can switch the CountTargetEncodingTransformer to use it instead of HashJoiningTransform, but I would like to wait for @justinormont to comment about the importance of the Combine option first.


In reply to: 582085386 [](ancestors = 582085386)


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:647 in c3df0e0. [](commit_id = c3df0e0, deletion_comment = False)

@yaeldekel
Copy link
Author

                return Hashing.MurmurHash(seed, sb, 0, sb.Length);

Is there a plan to switch HashingTransformer to use something other than Murmur hash?


In reply to: 582086748 [](ancestors = 582086748)


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:646 in c3df0e0. [](commit_id = c3df0e0, deletion_comment = False)

@harishsk
Copy link
Contributor

                return Hashing.MurmurHash(seed, sb, 0, sb.Length);

No, there is no plan to switch to something other MurmurHash. But we have had a number of discussions on how to keep compatibility between the version of MurmurHash in ML.NET and the one in Onnx. We have had to increment the version of the HashingEstimator to make them compatible.

Since the HashJoiningTransform also uses MurmurHash we should discuss its Onnx export mechanism and implement it before making this public so that if any changes are required in the model versions or on the onnx side, we make it only once.


In reply to: 582428008 [](ancestors = 582428008,582086748)


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:646 in c3df0e0. [](commit_id = c3df0e0, deletion_comment = False)

@harishsk
Copy link
Contributor

            };

It would be great if the CountTargetEncodingTransformer also used the HashingTransformer.


In reply to: 582427616 [](ancestors = 582427616,582085386)


Refers to: src/Microsoft.ML.Transforms/HashJoiningTransform.cs:647 in c3df0e0. [](commit_id = c3df0e0, deletion_comment = False)

// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Collections.Generic;
Copy link
Contributor

@harishsk harishsk May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rename this file to not call it Dracula? #Resolved

Copy link
Contributor

@harishsk harishsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@yaeldekel yaeldekel merged commit c0eeea9 into dotnet:master Jun 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a "learning with counts" transformer
10 participants