Skip to content

LoadImages not warning that input column is empty and ignoring imageFolder parameter in such a case #4429

Closed
@antoniovs1029

Description

@antoniovs1029

I don't know if this is actually an issue, but it's something that I noticed when working with the LoadImages method, and perhaps it is necessary to warn the user that this could happen, whether at runtime, or in the documentation.

Issue

As I show in the source code I provide below, if all the values of the input column of a LoadImages transform are empty when fitting a pipeline, then the code will still run and not give any warning whatsoever, even though no image is actually loaded to train the model. The transform would also appear to work when transforming an input Data View which uses an empty column as input of the LoadImages, and, in the example I provide, the pipeline would still assign a predicted label to each row of the input data view bein transformed, even if no image was actually loaded.

I show this by exemplifying two main cases in which this could happen:

  1. When the user loads its data through a method such as LoadFromTextFile, with an input file that only has 2 columns, but the user specifies that the ImagePath column (to be used as input column of the LoadImages method) is the 3rd column inside of the file. This kind of scenario could happen if the user makes a typo in the ModelInput class, or if the user (perhaps mistakenly) passes an input file that doesn't contain an image path column.

  2. When the user loads its data through a method such as LoadFromEnumerable from an array where all of the objects provide either a null or an empty string value to the ImagePath column.

Also notice in my code that in both cases the LoadImages transform also ignores whatever is passed as the imageFolder parameter, since because there are actually no ImagePaths, it will never try to load images. If there was at least one ImagePath in the input dataview, then LoadImages would actually try to load that image using the imageFolder parameter, and an exception is correctly thrown if the folder doesn't exist.

Why is this a problem?

  • I would understand this behavior happening if the input data view doesn't provide an image path for some of the rows, specially when working with big datasets. But I think it becomes a problem if there's actually no image loaded, and the whole thing appears to work without a warning, like in the example I provided. If a user unknowingly makes a mistake that leads to this problem, then s/he might believe that the model was actually correctly trained with actual images, or that it actually transformed an input where no imagePath was provided. This problem might be harder to spot in more complex pipelines or input files.

  • Also the fact that the imageFolder parameter gets ignored in this case seems odd to me, as I would have expected an exception to be thrown if a user passes an inexistent folder path to the LoadImages transformer, regardless of the content of the ImagePath column.

Source code and input file

Download solution:
solution.zip

Or look into the code in here:
https://gist.github.com/antoniovs1029/997ca183411f173e81a131f09722b092

Metadata

Metadata

Assignees

Labels

P1Priority of the issue for triage purpose: Needs to be fixed soon.bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions