Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add image featurizer to AutoFeaturizer #6261

Merged

Conversation

LittleLittleCloud
Copy link
Contributor

@LittleLittleCloud LittleLittleCloud commented Jul 26, 2022

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

This PR adds featurizer for image path. When there's a column, or multiple columns that are referred as image path, a set of estimators with search space will be added for those columns which featurizes image using one of DNN featurizers (ResNet18, ResNet50, AlexNet...)

The initial idea comes from @justinormont, which is a great cross-platform solution to leverage automl in image classification, and can be a more efficient way compared with deep learning, especially on small datasets.

The estimators that use to featurize images are

LoadImage -> ResizeImage(224, 224) -> ExtractPixels -> DnnFeaturizer(one of resnet18, resnet50, alexnet, resnet 101)

which transfers an image into a numeric feature array for classifiers to learn and transform.

And while training, the search space from those estimators will be added to the global search space and will be optimized by the selected tuner

@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #6261 (db2d198) into main (de9afb5) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #6261      +/-   ##
==========================================
- Coverage   68.39%   68.38%   -0.02%     
==========================================
  Files        1141     1144       +3     
  Lines      244820   244885      +65     
  Branches    25405    25405              
==========================================
+ Hits       167444   167460      +16     
- Misses      70722    70772      +50     
+ Partials     6654     6653       -1     
Flag Coverage Δ
Debug 68.38% <100.00%> (-0.02%) ⬇️
production 62.83% <ø> (-0.02%) ⬇️
test 88.99% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...t/Microsoft.ML.AutoML.Tests/AutoFeaturizerTests.cs 90.24% <100.00%> (+5.05%) ⬆️
...DnnImageFeaturizer.ResNet101/ResNet101Extension.cs 0.00% <0.00%> (ø)
....ML.DnnImageFeaturizer.AlexNet/AlexNetExtension.cs 0.00% <0.00%> (ø)
...L.DnnImageFeaturizer.ResNet50/ResNet50Extension.cs 0.00% <0.00%> (ø)
src/Microsoft.ML.Sweeper/AsyncSweeper.cs 72.78% <0.00%> (+1.36%) ⬆️

@@ -587,6 +587,44 @@ internal SweepableEstimator[] CatalogFeaturizer(string[] outputColumnNames, stri
return new SweepableEstimator[] { SweepableEstimatorFactory.CreateOneHotEncoding(option), SweepableEstimatorFactory.CreateOneHotHashEncoding(option) };
}

internal MultiModelPipeline ImagePathFeaturizer(string outputColumnName, string inputColumnName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on adding support for having a folder and not just an image column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for now, this only support full image path

Copy link
Member

@michaelgsharp michaelgsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@michaelgsharp michaelgsharp merged commit c30a63e into dotnet:main Aug 2, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants