Closed
Description
Many exception messages thrown are unclear - as a result, when an exception occurs, it's challenging to identify whether the issue in with the ML.NET code, with the underlying data, with how the algorithm is being applied, etc. Often it takes stepping through the ML.NET fwk in attempt to get further context.
I logged this as a single issue because I think there would be benefit in looking at all places where exceptions are being thrown\rethrown to ensure that default exception messages aren't provided and that the messages are as clear\rich as possible. Let me know if you would like these broken into separate issues rather than having them combined in one.
Here are some specific examples:
Trainer | Scenario | Actual Message | Suggested Message | |
---|---|---|---|---|
1. | N/A | Occurs when invalid field index is provided to the LoadColumn attribute. For example: [LoadColumn(100)] public uint Label { get; set; } In the above code, the value of 100 is an invalid index value since the underlying data has less than 100 columns. |
System.ArgumentNullException: 'Value cannot be null. Parameter name: items' | Message should indicate which column has the issue; the reference to parameter ‘items’ is unclear. |
2. | N/A | Occurs when Feature column is of some other type than float\single. For example: [ColumnName("Test"), LoadColumn(135)] public uint Test { get; set; } |
System.InvalidOperationException: 'Column ‘Test’ has values of UInt32, which is not the same as earlier observed type of Single.' | It’s unclear what “same as earlier observed type” means. Consider rewording to state that the Feature columns must all be of a certain type (e.g. Single). |
3. | LightGbm | Occurs when custom gains are specified without providing a group id column. For example:var customGains = new LightGbmRankingTrainer.Options(); customGains.CustomGains = new int[] { 0, 1, 2, 3 };IEstimator<ITransformer> trainer = mlContext.Ranking.Trainers.LightGbm(customGains);IEstimator<ITransformer> trainerPipeline = dataPipeline.Append(trainer); Notice that in the above code, the Group Id isn’t being explicitly set as follows: customGains.RowGroupColumnName = "GroupId"; |
System.ArgumentOutOfRangeException: 'Need a group column. Parameter name: data' | ArgumentOutOfRangeException is confusing; instead, throw ArgumentNullException or InvalidOperationException. Message should also indicate the ‘Group Id’ column is missing\null; the reference to parameter ‘data’ is unclear. |
4. | LightGbm | Occurs when custom gains cardinality doesn’t match the cardinality of the relevance label values. For example:var customGains = new LightGbmRankingTrainer.Options(); customGains.CustomGains = new int[] { 0, 1, 2 }; customGains.RowGroupColumnName = "GroupId"; In the underlying data, the relevance label values are: {0, 1, 2, 3, 4 } – in other words, the cardinality of the relevance label values is greater than the specified custom gains. |
System.InvalidOperationException: 'LightGBM Error, code is -1, error message is 'label (0) excel the max range 3'.' | There appears to be a typo – “excel” should say “exceeds”. Also, the message should state that the cardinality of the relevance label values must less than or equal to the cardinality of the custom gains. Note: Refer to similar issue logged directly against LightGBM: microsoft/LightGBM#1090 |