Skip to content

Conversation

@augustoolucas
Copy link

@augustoolucas augustoolucas commented Mar 27, 2021

SVHN is transposed into HWC when converting it into PIL Image, on the getitem method. This is inconsistent with other datasets, like CIFAR10 and MNIST, where the image is already on the right shape at the moment of the PIL conversion. This inconsistency makes it needed to do two transpositions instead of only one, and can make it troublesome to use this dataset together with others, like the aforementioned ones.

This PR fix this behaviour, transposing the data into HWC on class init.

@facebook-github-bot
Copy link
Contributor

Hi @augustoolucas!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@augustoolucas augustoolucas marked this pull request as ready for review March 27, 2021 23:44
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@codecov
Copy link

codecov bot commented Mar 30, 2021

Codecov Report

Merging #3611 (5ce2758) into master (07fb8ba) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head 5ce2758 differs from pull request most recent head 04ad207. Consider uploading reports for the commit 04ad207 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3611      +/-   ##
==========================================
- Coverage   79.73%   79.69%   -0.04%     
==========================================
  Files         105      105              
  Lines        9818     9824       +6     
  Branches     1579     1583       +4     
==========================================
+ Hits         7828     7829       +1     
- Misses       1513     1517       +4     
- Partials      477      478       +1     
Impacted Files Coverage Δ
torchvision/datasets/svhn.py 66.66% <100.00%> (ø)
torchvision/transforms/functional_pil.py 68.12% <0.00%> (-1.14%) ⬇️
torchvision/datasets/voc.py 94.44% <0.00%> (-0.24%) ⬇️
torchvision/__init__.py 61.11% <0.00%> (ø)
torchvision/ops/boxes.py 91.26% <0.00%> (ø)
torchvision/ops/roi_align.py 70.96% <0.00%> (ø)
torchvision/transforms/transforms.py 84.30% <0.00%> (ø)
torchvision/models/detection/retinanet.py 90.23% <0.00%> (ø)
torchvision/transforms/functional.py 81.85% <0.00%> (+0.04%) ⬆️
torchvision/datasets/folder.py 77.55% <0.00%> (+0.23%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07fb8ba...04ad207. Read the comment docs.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for the PR!

The change looks good to me, although it could be considered BC-breaking if a user were to access self.data directly (which I would hope it's not the case, but given that you are sending this PR then there might be users doing it :-) )

From this perspective, and given that we don't currently have a good way of checking if this is being used by others, I'd maybe prefer to restrain from merging this PR.

Thoughts?

@NicolasHug
Copy link
Member

it could be considered BC-breaking if a user were to access self.data directly

I was thinking the same, and I was wondering: from now on, should we make it a habit of always introducing attributes as privates (e.g. with a leading _), unless that's something we explicitly want to expose, with corresponding docs etc.? We could do the same with functions and classes. This way, we would only be constrained by BC compat on the explicitly public-facing API, and we'd be able to merge such PRs which likely introduce a small performance boost.

@fmassa
Copy link
Member

fmassa commented Mar 30, 2021

@NicolasHug yes, I think from now on we might want to start prefixing the attributes of new datasets / etc with an underscore, so that we can have more flexibility there.

@datumbox
Copy link
Contributor

What if we pay the debt now, make the self.data private and at the same time accept the proposed change of this PR? This will be BC-breaking but it will be a start of cleaning things up. Also to play devil's advocate, why would anyone access self.data when the main target of the class is to index them for you and provide extra functionality?

@augustoolucas
Copy link
Author

augustoolucas commented Mar 31, 2021

Well, don't know if I have much more to add to this discussion but...
Although I think it's a very good idea to start setting these attributes as private for new datasets, this leads to the problem of inconsistency... imo, it would be better if all datasets had a standard way of implementing things. I know that the way that things are implemented under the hood should not be a problem, since we have standard methods to get access to the data. But having datasets which we can access it directly while having others which this isn't possible doesn't sound good, for me. I suggested the changes on this pull request exactly to solve an inconsistency and was not thinking about creating more ones, haha. About setting the attributes as private by now for the existing datasets, well... the BC issue would be even worse... But I really think that it should be considered for future versions.

@NicolasHug NicolasHug mentioned this pull request Apr 8, 2021
@NicolasHug
Copy link
Member

Thanks again for the PR @augustoolucas . Unfortunately our hands are tied with the backward compatibility changes here, so I'll close the PR if you don't mind.

However, we're currently working on re-implementing the datasets in a more unified and flexible fashion, so we can be confident that this issue will be fixed in the future!

CC @pmeier : this will probably be something to keep in mind when implementing the equivalent DataPipe class :)

@NicolasHug NicolasHug closed this Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants