Skip to content

fix LdaWorkoutEstimatorCore #4927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions test/Microsoft.ML.Tests/Transformers/TextFeaturizerTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -714,7 +714,6 @@ public void LdaWorkout()
}

[Fact]
[Trait("Category", "SkipInCI")]
public void LdaWorkoutEstimatorCore()
{
var ml = new MLContext(1);
Expand All @@ -729,7 +728,13 @@ public void LdaWorkoutEstimatorCore()
builder.AddColumn("F1V", NumberDataViewType.Single, data);
var srcView = builder.GetDataView();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a comment can be added here that explains the need to reset the random number generator for each received document due to way that ML.NET and LdaNative utilize multiple threads differently, for future readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, thanks


In reply to: 390162926 [](ancestors = 390162926)

var est = ml.Transforms.Text.LatentDirichletAllocation("F1V");
//Attention: resetRandomGenerator needs to be true here as multiple compare will be performed later.
//In lda_engine, a queue of samples with size of (num_of_threads - 2) will be created at first,
//each time a compare is performed the internal status of one sample (random number: rng_) is changed,
//so if size of queue is smaller the number of compare performed, dirty data will be used again for calculation
//and cause issue. set resetRandomGenerator to true will reset the random number rng_ every time
//before lda calculation.
var est = ml.Transforms.Text.LatentDirichletAllocation("F1V", resetRandomGenerator: true);
TestEstimatorCore(est, srcView);
}

Expand Down