Skip to content

TextCatalog.ApplyWordEmbedding to KMeans Trainer generates IndexOutOfRangeException #4397

@MaxAkbar

Description

@MaxAkbar

System information

  • OS version/distro: Windows 10 PRO 10.0.18362
  • .NET Version (eg., dotnet --info): 3.1.100-preview1-014459

Issue

I am trying to cluster a group of documents. For this sample, I used news articles short descriptions. If I run this sample with FeaturizeText the sample builds a model. If I try to apply TextCatalog.ApplyWordEmbedding I get a System.IndexOutOfRangeException.

  • What did you do? Applying Wordembedding to KMeans Trainer
  • What happened? IndexOutOfRangeException
  • What did you expect? For the ML.NET to build my model

Source code / logs

Sample code to reproduce the problem can be found here.

StackTrace:
System.AggregateException: One or more errors occurred. (Index was outside the bounds of the array.) (Index was outside the bounds of the array.) (Index was outside the bounds of the array.)
---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer`1& point, Int32 pointRowIndex, Single[] weights, Random rand)
at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_1`2.b__0()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.WaitAll(Task[] tasks)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source)
at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
at Microsoft.ML.Trainers.KMeansUtils.ParallelMapReduce[TPartitionState,TGlobalState](Int32 numThreads, IHost baseHost, Factory factory, RowIndexGetter rowIndexGetter, InitAction1 initChunk, MapAction1 mapper, ReduceAction`2 reducer, TPartitionState[]& buffer, TGlobalState& result)
at Microsoft.ML.Trainers.KMeansBarBarInitialization.Initialize(IHost host, Int32 numThreads, IChannel ch, Factory cursorFactory, Int32 k, Int32 dimensionality, VBuffer`1[] centroids, Int64 accelMemBudgetMb, Int64& missingFeatureCount, Int64& totalTrainingInstances)
at Microsoft.ML.Trainers.KMeansTrainer.TrainCore(IChannel ch, RoleMappedData data, Int32 dimensionality)
at Microsoft.ML.Trainers.KMeansTrainer.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at ClusteringNewsArticles.Train.Program.Main(String[] args) in C:\Users\maxim\Source\Repos\machinelearning-samples\samples\csharp\getting-started\Clustering_NewsArticles\ClusteringNewsArticles.Train\Program.cs:line 54
---> (Inner Exception #1) System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer`1& point, Int32 pointRowIndex, Single[] weights, Random rand)
at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_1`2.b__0()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<---

---> (Inner Exception #2) System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer1& point, Int32 pointRowIndex, Single[] weights, Random rand) at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_12.b__0()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<--- |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions