Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve VectorTypeAttribute(dims) docs #5301

Merged

Conversation

antoniovs1029
Copy link
Member

@antoniovs1029 antoniovs1029 commented Jul 10, 2020

Addresses #5273 : On the confusion of users about how to use the VectorTypeAttribute.

As confirmed offline by @yaeldekel , users currently can't provide a multidimensional array with a VectorTypeAttribute(dims) . They should flatten their arrays into 1-D arrays and then use the VectorTypeAttribute(dims) to describe the shape ML.NET will use internally.

Note: Originally I was planning on throwing a readable exception if the user added the VectorTypeAttribute to a multidimensional array, but some discussion would be needed to decide where to do this (either in InternalSchemaDefinition.Create, or somewhere else on InternalSchemaDefinition or SchemaDefinition itself). Furthermore, I'm thinking that maybe there's a bug on these classes, where if a user adds a custom DataViewType (see #3263 ) and adds the attribute of this type to an array, then some unexpected behavior will happen as InternalSchemaDefinition.GetMappedType() assumes that any array (multidimensional or otherwise) is isVector = true, and then elsewhere in the code we assume that if a type isVector == true then it should be a VectorDataViewType. More discussion and inspection would be needed on this regard and it might, or not, affect the decision of where to throw an exception for multidimensionals array with VectorTypeAttribute. And these may be addressed on a separate PR.

So, for the time being, and the fact that this might not be a common problem anyway, I think that simply changing slightly the docs is the best option for the next release.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner July 10, 2020 22:02
@codecov
Copy link

codecov bot commented Jul 10, 2020

Codecov Report

Merging #5301 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5301      +/-   ##
==========================================
+ Coverage   73.68%   73.69%   +0.01%     
==========================================
  Files        1022     1022              
  Lines      190366   190366              
  Branches    20474    20474              
==========================================
+ Hits       140265   140295      +30     
+ Misses      44568    44545      -23     
+ Partials     5533     5526       -7     
Flag Coverage Δ
#Debug 73.69% <ø> (+0.01%) ⬆️
#production 69.44% <ø> (+0.02%) ⬆️
#test 87.62% <ø> (ø)
Impacted Files Coverage Δ
src/Microsoft.ML.Data/Data/SchemaDefinition.cs 71.79% <ø> (ø)
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 85.94% <0.00%> (+0.15%) ⬆️
src/Microsoft.ML.AutoML/Sweepers/Parameters.cs 85.16% <0.00%> (+0.84%) ⬆️
src/Microsoft.ML.Sweeper/AsyncSweeper.cs 72.78% <0.00%> (+1.36%) ⬆️
src/Microsoft.ML.Maml/MAML.cs 26.21% <0.00%> (+1.45%) ⬆️
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 86.36% <0.00%> (+1.65%) ⬆️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 62.91% <0.00%> (+3.31%) ⬆️
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 92.78% <0.00%> (+4.12%) ⬆️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 87.39% <0.00%> (+7.56%) ⬆️

@antoniovs1029 antoniovs1029 merged commit 1b44be7 into dotnet:master Jul 11, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants