Skip to content

Samples and unit test for image-related transform estimators #3165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 3, 2019

Conversation

abgoswam
Copy link
Member

@abgoswam abgoswam commented Apr 2, 2019

Towards #1209

The PR makes the following changes

  • Adds unit test and sample for the ConvertToImage transform estimator.

  • Fixes the info presented to the user for the 4 existing image samples {ConvertToGrayscale, LoadImages, ExtractPixels, ResizeImages}

@abgoswam abgoswam requested review from sfilipi and Ivanidzo4ka April 2, 2019 01:30
@abgoswam abgoswam changed the title Image samples Samples and unit test for image-related transform estimators Apr 2, 2019
@abgoswam abgoswam requested a review from shmoradims April 2, 2019 01:32
// ImagePath : tomato.bmp
// Name : tomato
// ImageObject : System.Drawing.Bitmap
// Grayscale : System.Drawing.Bitmap
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe transpose, to give the column/row idea more accurately. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 271135652 [](ancestors = 271135652)

// Preview 1 row of the transformedData.
var transformedDataPreview = transformedData.Preview(1);
foreach (var kvPair in transformedDataPreview.RowView[0].Values)
{
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 12, length = 1)

omit :) #Resolved

{
public static class ConvertToImage
{
private const int inputSize = 3 * 224 * 224;
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 224 * 224; [](start = 39, length = 13)

might want to add a comment about what does each number represent #Resolved

Features = Enumerable.Repeat(0, inputSize).Select(x => random.NextDouble() * 100).ToArray()
};
}
}
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd move this below the example. If the users want to scroll to it they can. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 271135840 [](ancestors = 271135840)

// Preview 1 row of the transformedData.
var transformedDataPreview = transformedData.Preview(1);
foreach (var kvPair in transformedDataPreview.RowView[0].Values)
{
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 11, length = 2)

omit #Resolved


// The transformedData IDataView contains the loaded and resized raw values

// Preview 1 row of the transformedData.
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for space in between. #Resolved

// Pixels : Dense vector of size 30000

Console.WriteLine("--------------------------------------------------");

Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for it. Users can add it #Resolved

Console.WriteLine("--------------------------------------------------");

// Using schema comprehension to display raw pixels for each row.
// Display extracted pixels in column 'Pixels'.
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be easier to understand if put as:

Convert the transformedData into an IEnumerable #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified the pretty printing method


In reply to: 271136133 [](ancestors = 271136133)


// Preview of the content of the images.tsv file
//
// ImagePath Name ImageObject "Pixels"
Copy link
Member

@sfilipi sfilipi Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ImagePath Name ImageObject [](start = 11, length = 41)

maybe keep the headers/column names, even if they don't print out. Or print them separately. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


In reply to: 271136252 [](ancestors = 271136252)


var transformedData = pipeline.Fit(data).Transform(data);
// The transformedData IDataView contains the loaded images now
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now [](start = 72, length = 3)

dot in the end of sentence :) #Resolved


var transformedData = pipeline.Fit(data).Transform(data);
// The transformedData IDataView contains the resized images now
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dot here is well #Resolved

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #3165 into master will increase coverage by 0.01%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master    #3165      +/-   ##
==========================================
+ Coverage   72.53%   72.55%   +0.01%     
==========================================
  Files         808      807       -1     
  Lines      144775   144793      +18     
  Branches    16209    16211       +2     
==========================================
+ Hits       105012   105054      +42     
+ Misses      35348    35323      -25     
- Partials     4415     4416       +1
Flag Coverage Δ
#Debug 72.55% <94.73%> (+0.01%) ⬆️
#production 68.14% <ø> (+0.02%) ⬆️
#test 88.83% <94.73%> (ø) ⬆️
Impacted Files Coverage Δ
...c/Microsoft.ML.ImageAnalytics/ExtensionsCatalog.cs 16.66% <ø> (+5.55%) ⬆️
test/Microsoft.ML.Tests/ImagesTests.cs 98.69% <94.73%> (-0.13%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
src/Microsoft.ML.Data/Transforms/Normalizer.cs 86.03% <0%> (ø) ⬆️
...icrosoft.ML.Functional.Tests/DataTransformation.cs 100% <0%> (ø) ⬆️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.49% <0%> (ø) ⬆️
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 99.49% <0%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Training.cs 100% <0%> (ø) ⬆️
...est/Microsoft.ML.Tests/FeatureContributionTests.cs 98.55% <0%> (ø) ⬆️
...oft.ML.Experimental/TransformsCatalogExtensions.cs
... and 5 more

@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #3165 into master will increase coverage by 0.01%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master    #3165      +/-   ##
==========================================
+ Coverage   72.53%   72.55%   +0.01%     
==========================================
  Files         808      807       -1     
  Lines      144775   144793      +18     
  Branches    16209    16211       +2     
==========================================
+ Hits       105012   105054      +42     
+ Misses      35348    35322      -26     
- Partials     4415     4417       +2
Flag Coverage Δ
#Debug 72.55% <94.73%> (+0.01%) ⬆️
#production 68.14% <ø> (+0.02%) ⬆️
#test 88.83% <94.73%> (ø) ⬆️
Impacted Files Coverage Δ
...c/Microsoft.ML.ImageAnalytics/ExtensionsCatalog.cs 16.66% <ø> (+5.55%) ⬆️
test/Microsoft.ML.Tests/ImagesTests.cs 98.69% <94.73%> (-0.13%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
src/Microsoft.ML.Data/Transforms/Normalizer.cs 86.03% <0%> (ø) ⬆️
...icrosoft.ML.Functional.Tests/DataTransformation.cs 100% <0%> (ø) ⬆️
...s/Api/CookbookSamples/CookbookSamplesDynamicApi.cs 93.49% <0%> (ø) ⬆️
...s/Scenarios/Api/CookbookSamples/CookbookSamples.cs 99.49% <0%> (ø) ⬆️
test/Microsoft.ML.Functional.Tests/Training.cs 100% <0%> (ø) ⬆️
...est/Microsoft.ML.Tests/FeatureContributionTests.cs 98.55% <0%> (ø) ⬆️
...oft.ML.Experimental/TransformsCatalogExtensions.cs
... and 5 more

foreach (var row in data.RowView)
{
foreach (var kvPair in row.Values)
{
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 16, length = 1)

for 1-liner blocks, let's not use {} to keep the samples short. similar to foreach loop on line 58 #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually these are 2-liners. may use Linq to shorten it.


In reply to: 271452481 [](ancestors = 271452481)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping 2 for loops to keep logic simple


In reply to: 271453121 [](ancestors = 271453121,271452481)

private static void PrintPreview(DataDebuggerPreview data)
{
foreach (var colInfo in data.ColumnView)
Console.Write("{0, -25}", colInfo.Column.Name);
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"{0, -25}" [](start = 30, length = 10)

let's just define this as:
var format = "{0,-25}" (they usually don't use space after comma)

and use 'format' in the rest of the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not getting you .. whats meant by using 'format' ? is the suggestion to use string.Format somehow ..


In reply to: 271455734 [](ancestors = 271455734)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used control+k+d to fix format (if thats what u meant)


In reply to: 271462084 [](ancestors = 271462084,271455734)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ivan meant to define the "{0,-25}" as a variable, rather than repeating it, and reuse the variable below.


In reply to: 271465497 [](ancestors = 271465497,271462084,271455734)

var random = new Random(seed);

for (int i = 0; i < count; i++)
{
Copy link

@shmoradims shmoradims Apr 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ [](start = 12, length = 1)

omit #Resolved


private class DataPoint
{
[VectorType(inputSize)]
Copy link
Member

@sfilipi sfilipi Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[VectorType(inputSize)] [](start = 11, length = 24)

is this annotation needed? #Resolved

Copy link
Member Author

@abgoswam abgoswam Apr 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is indeed needed.

when fitting a model, we do some checks e.g. schema validation / size checks / whether the estimator works with fixed size or variable size etc

the VectorToImageConvertingEstimator works on fixed-size vector . without the annotation, the assumtion is that its a variable size vector , and the following check fails :

if (col.Kind != SchemaShape.Column.VectorKind.Vector || (col.ItemType != NumberDataViewType.Single && col.ItemType != NumberDataViewType.Double && col.ItemType != NumberDataViewType.Byte))

Unhandled Exception: System.ArgumentOutOfRangeException: Schema mismatch for input column 'Features': expected known-size vector of type Single, Double or Byte, got VarVector


In reply to: 271595813 [](ancestors = 271595813)

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@abgoswam abgoswam merged commit ac53748 into dotnet:master Apr 3, 2019
abgoswam added a commit to abgoswam/machinelearning that referenced this pull request Apr 5, 2019
…3165)

* updating image samples

* fix review comments (print preview)

* fix comments (minor nits)

* fix review comments (whitespaces)
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants