Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] [wip] Move/adopt to torchvision frequent utils from detectron2 (and potentially other frameworks) #7070

Open
vadimkantorov opened this issue Jan 9, 2023 · 4 comments

Comments

@vadimkantorov
Copy link

vadimkantorov commented Jan 9, 2023

🚀 The feature

I propose to open the discussion and collect in this issue some discrepancies or duplicate functionalities I found between detectron2 and torchvision.

detecton2 / mmdet / kornia are now quite old and established, so one can discuss of adopting well-designed / widely-used util functions and Modules from these frameworks to torchvision, so that there's less reimplementation and more established idioms.

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

@vadimkantorov vadimkantorov changed the title [discussion] [wip] Move to torchvision frequent utils from detectron2 (and potentially other frameworks) [discussion] [wip] Move/adopt to torchvision frequent utils from detectron2 (and potentially other frameworks) Jan 10, 2023
@oke-aditya
Copy link
Contributor

Okay, let's discuss each of these.

  • Rotated formats CUDA Ops: -

Rotated operations need additional support as well as Transforms API etc. See #2761 It will also need to be compatible with new Transforms API etc.

  • Fast evaluators: -
    Can you please elaborate how would it benefit and where we could keep in torchvision.

  • MasOps: -
    torchvision already has paste_masks_in_image. What other Mask ops do you have in mind?

  • Color Map Visualization: -

torchvision has few colormap utils in torchvision. Please see https://github.com/pytorch/vision/blob/main/torchvision/utils.py#L486
Albeit these are private, as torchvision is not in general a visualization library so we were hesitant to make it public.

  • Instance and Box Visualizer: -

Again torchvision has few visualization utils https://github.com/pytorch/vision/blob/main/torchvision/utils.py#L13
Are there any additional methods you suggesting? Panopatic segmentation etc is not supported by torchvision so it might not be great to have visualization util for same. Can you elaborate what you want from detectron here?

  • Exif parser: -

I have no idea. cc @vfdev-5 @pmeier

  • Focal Loss: -

Torchvision already have focal loss. https://github.com/pytorch/vision/blob/main/torchvision/ops/focal_loss.py

Let's discuss a bit more so we know requirements clearly. :)

@pmeier
Copy link
Collaborator

pmeier commented Jan 11, 2023

Re exif: I have no knowledge about that, but the function inside detectron two is over 2 years old and links to a now closed Pillow issue. According to the comments in the thread there, the patch landed in Pillow==7.0.0. Although we haven't acted on it yet, we have agreed that we should support Pillow starting from current major minus 1 in #5898 (comment). Given that the current version is Pillow==9.4.0, we can limit support to Pillow>=8 and thus can depend on the fix.

@vadimkantorov
Copy link
Author

vadimkantorov commented Jan 11, 2023

On Can you please elaborate how would it benefit and where we could keep in torchvision.:
I think it would make COCO / COCO-format evaluation of detection/instance segmentation faster and reduce the number of dependencies for simple/standard setups for the users. And now COCO-format can be reasonably thought as one of dominant dataset formats for instance/panoptic detection and segmentation. So it would be good to have the metric evaluation (fast, parallelized, tested, RLE mask functions in core) included as well. About where - no idea :) A new namespace torchvision.eval or torchvision.evalutils?

About exif: also, are libjpg/libpng supporting parsing EXIF attributes from image files? or maybe can be done with a very simple python parser? I guess pillow is needed only for reading these attributes here. So without pillow, some sort of EXIF parser is needed. OpenCV also has some information about exif parsing in their changelog: https://github.com/opencv/opencv/wiki/ChangeLog#version452. This may be more related to I/O component.

about focal loss: can we ask fvcore authors if their focal loss matches exactly the focal loss in torchvision? currently detectron2 uses some fvcore stuffs. I guess one could check all functions from fvcore used by detectron2 and check their usefulness for inclusion in core

About paste_masks_in_image: why then small difference in API and why doesn't detectron2 use the torchvision's function if they are identical?

about rotated boxes ops: converging on object-oriented structures can take a long time (i think transforms/prototype v2 is already lasting for 1 year? 2 years?), but having simple purely functional functions (accepting pure tensors with well-described format of docs) should require less of design and can even happen before full support of transforms - this would allow using these ops in other frameworks. other frameworks have their own modeling for "structures" / transforms anyway. but importing those transforms on "ops" level is also a good idea IMO.

In general, my motivation for this discussion opening is that deduplicating common/frequently-used/useful well-contained/small-api idioms from other libraries will help them reduce the API surface, and these idioms would be useful for all downstream libraries and users and reduce the amount of boiler-plate. And that a systematic discussion with those library authors would be fruitful, they would also have interesting things to say which idioms/functions (especially well-contained and small API-surface) turned out to be well-designed and would be good to include in core for reusability

Above I just scratched some utils that might be good to have in core while browsing the current detectron2 codebase, so the list is quite arbitrary. So this is not in stone, I just propose to look at those codebases with a fresh eye and ask whether some utils have become standard enough

@vadimkantorov
Copy link
Author

@oke-aditya for other mask ops/utils, I had posted some my ideas in #4415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants