Improve VectorTypeAttribute(dims) docs #5301
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addresses #5273 : On the confusion of users about how to use the VectorTypeAttribute.
As confirmed offline by @yaeldekel , users currently can't provide a multidimensional array with a
VectorTypeAttribute(dims)
. They should flatten their arrays into 1-D arrays and then use theVectorTypeAttribute(dims)
to describe the shape ML.NET will use internally.Note: Originally I was planning on throwing a readable exception if the user added the
VectorTypeAttribute
to a multidimensional array, but some discussion would be needed to decide where to do this (either inInternalSchemaDefinition.Create
, or somewhere else onInternalSchemaDefinition
orSchemaDefinition
itself). Furthermore, I'm thinking that maybe there's a bug on these classes, where if a user adds a customDataViewType
(see #3263 ) and adds the attribute of this type to an array, then some unexpected behavior will happen asInternalSchemaDefinition.GetMappedType()
assumes that any array (multidimensional or otherwise) isisVector = true
, and then elsewhere in the code we assume that if a typeisVector == true
then it should be aVectorDataViewType
. More discussion and inspection would be needed on this regard and it might, or not, affect the decision of where to throw an exception for multidimensionals array withVectorTypeAttribute
. And these may be addressed on a separate PR.So, for the time being, and the fact that this might not be a common problem anyway, I think that simply changing slightly the docs is the best option for the next release.