Skip to content

TextCatalog.ApplyWordEmbedding to KMeans Trainer generates IndexOutOfRangeException #4397

Closed

Description

System information

  • OS version/distro: Windows 10 PRO 10.0.18362
  • .NET Version (eg., dotnet --info): 3.1.100-preview1-014459

Issue

I am trying to cluster a group of documents. For this sample, I used news articles short descriptions. If I run this sample with FeaturizeText the sample builds a model. If I try to apply TextCatalog.ApplyWordEmbedding I get a System.IndexOutOfRangeException.

  • What did you do? Applying Wordembedding to KMeans Trainer
  • What happened? IndexOutOfRangeException
  • What did you expect? For the ML.NET to build my model

Source code / logs

Sample code to reproduce the problem can be found here.

StackTrace:
System.AggregateException: One or more errors occurred. (Index was outside the bounds of the array.) (Index was outside the bounds of the array.) (Index was outside the bounds of the array.)
---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer`1& point, Int32 pointRowIndex, Single[] weights, Random rand)
at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_1`2.b__0()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Task.WaitAll(Task[] tasks)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source)
at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
at Microsoft.ML.Trainers.KMeansUtils.ParallelMapReduce[TPartitionState,TGlobalState](Int32 numThreads, IHost baseHost, Factory factory, RowIndexGetter rowIndexGetter, InitAction1 initChunk, MapAction1 mapper, ReduceAction`2 reducer, TPartitionState[]& buffer, TGlobalState& result)
at Microsoft.ML.Trainers.KMeansBarBarInitialization.Initialize(IHost host, Int32 numThreads, IChannel ch, Factory cursorFactory, Int32 k, Int32 dimensionality, VBuffer`1[] centroids, Int64 accelMemBudgetMb, Int64& missingFeatureCount, Int64& totalTrainingInstances)
at Microsoft.ML.Trainers.KMeansTrainer.TrainCore(IChannel ch, RoleMappedData data, Int32 dimensionality)
at Microsoft.ML.Trainers.KMeansTrainer.TrainModelCore(TrainContext context)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.TrainTransformer(IDataView trainSet, IDataView validationSet, IPredictor initPredictor)
at Microsoft.ML.Trainers.TrainerEstimatorBase`2.Fit(IDataView input)
at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
at ClusteringNewsArticles.Train.Program.Main(String[] args) in C:\Users\maxim\Source\Repos\machinelearning-samples\samples\csharp\getting-started\Clustering_NewsArticles\ClusteringNewsArticles.Train\Program.cs:line 54
---> (Inner Exception #1) System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer`1& point, Int32 pointRowIndex, Single[] weights, Random rand)
at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_1`2.b__0()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<---

---> (Inner Exception #2) System.IndexOutOfRangeException: Index was outside the bounds of the array.
at Microsoft.ML.Trainers.KMeansBarBarInitialization.<>c__DisplayClass3_1.b__2(VBuffer1& point, Int32 pointRowIndex, Single[] weights, Random rand) at Microsoft.ML.Trainers.KMeansUtils.<>c__DisplayClass8_12.b__0()
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<--- |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions