-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] simpler BoundingBox2D #5
Conversation
Thanks for the PR. I agree that if we use a Pose2D for the center, we should specify the size in a float. I can think of a few reasons to keep using BoundingBox2D instead of RegionOfInterest, however. Some object detectors, such as YOLO, return floats for their bounding boxes (example). You could argue that this is information that doesn't need to be sent over a ROS message in most cases, but it does represent a loss of information. If we allow Detection3D to have a rotated bounding box, I think Detection2D should do the same for consistency. Some pipelines (such as cnn grasp detection, while not detectors in the strictest sense, predict rotated 2D bounding boxes. One more point is that the |
In that case we should make it clear in the comments of the .msg file how to interpret this exactly. If I have a bounding box of size 10.123 x 20.456 with a 23° rotation, which pixels are included in this bounding box?
Good point; I agree it would be good for the message to be general enough to capture that use case. That said, I don't think it holds that BoundingBox3D and 2D should be treated equally; they are fundamentally different. The reference frame of BoundingBox3D is arbitrary (such as the starting pose of the robot), and removing the orientation would force each bounding box to be axis-aligned with that frame. OTOH, the 2D bounding box doesn't talk about coordinates in the world wrt. some reference frame, but specifies a set of pixels in an image.
Yes, I mentioned that. So we shouldn't use RegionOfInterest, but copy + adjust it if we want to use it. To further complicate matters, here's one more way we could specify the bounding box: as (x, y) coordinates of the four corners. That would allow not only rotation, but also allow skew
I don't have an opinion at the moment if it's a good idea, I just thought I'd bring up the possibility. |
Great points! If we allow specifying the 4 corners, then the bounding box is either 1) imbued with some 3D information, or 2) not actually a rectilinear box. I think a generic polygon would be better to use in either case. I think using a Pose2D and float sizes are the best balance between ease of use and expressive power. We can definitely advocate for the Bresenham algo for when pixel-perfect cropping is required. This can be added in this commit or in a later documentation improving commit. Any other thoughts before I merge? |
I also want to point out that there's not much consensus on whether or not to specify the upper-left corner of a bounding box or the center. Upper left:
Center: So I guess we should simply choose one. |
After thinking it over and comparing the use cases, I've decided we should keep the rotated BoundingBox2D available in this package, but make the default message types support only a corner/size box without rotation, which I'll call BoundingRect. I'm going to merge this PR to confirm the other changes, then create the new BoundingRect in a new PR. |
This PR is to continue the discussion from #2 about bounding boxes.
I would strongly prefer to use something like a
sensor_msgs/RegionOfInterest
for the 2D bounding box:uint32
) specify the top left corner and the size in x and y.This would be much easier to implement correctly than a rotated bounding box with floats. With that approach, we would have to specify things like "use the Bresenham algorithm to figure out which pixels are inside the rotated box". I'm afraid since most people don't use the rotation, they'd just ignore it or implement it incorrectly.
Unfortunately,
sensor_msgs/RegionOfInterest
has a weirddo_rectify
field which doesn't make sense here. If that's a show stopper, we should create our own copy of RegionOfInterest without that field, otherwise we can use RegionOfInterest directly.If we do use the current BoundingBox2D, it should use
float
everywhere; as agreed upon in #2, it doesn't make sense to specify the center infloat
and the size inuint32
.If that message moves to
geometry_msgs
, it shouldn't be in pixel coordinates, but in meters. All the other measures of length in that package are in meters, and assume that some people would want to usegeometry_msgs/BoundingBox2D
for stuff like specifying a rectangular region in a map (where meters is the obvious unit). But meters would be useless for us. All this leads me to believe that a BoundingBox2D that would live happily ingeometry_msgs
is different from what we need.Note that this doesn't apply to
BoundingBox3D
: It's already in meters, so it could go togeometry_msgs
.