-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RetinaNet predict() is too slow #2014
Comments
Thanks for reporting the issue. |
On the above error, you could wrap it in
This will resolve the error. |
Thanks for help, the trick with dataset worked fine. With regards to the slow prediction. It becomes much faster after assigning a custom NMS with any thresholds: model.prediction_decoder = keras_cv.layers.MultiClassNonMaxSuppression(
bounding_box_format="xywh",
from_logits=True,
iou_threshold=1.0,
confidence_threshold=0.0
) However, it cannot be used for training, because COCO metrics validation callback raises an exception. I'm new to the object detection so not sure if it's expected behavior or not.
I have this validation callback in my code: keras_cv.callbacks.PyCOCOCallback(validation_data=eval_ds, bounding_box_format="xywh", cache=False) |
@axelusarov is the slower prediction just for the first step (e.g. due to graph tracing), or is it also that every step is slower? I suppose that the single-class NMS is the likely issue here in terms of performance. I don't think this error is expected behavior -- this looks like a bug. I believe that #2030 should fix it. Sorry for the regression and thanks for the issue report! |
@ianstenbit prediction is slow for each step. Here is
After using the code from master, validation step is much faster now when using custom NMS and I didn't notice errors this time. I didn't check training quality though. |
Yeah this looks like there's some significant slowdown with the single-class NMS for TensorFlow, and it's probably just just due to graph tracing. This is something we should look into further, but I can't prioritize it right now |
Was about to write a GitHub issue about this, but yes single-class NMS is much slower then mutli-class. Not just prediction, training is 10x slower. I haven't spent much time looking at the source code, graph tracing, etc., but I can take a look at this to see what is going on. |
After looking into this, it seems that NonMaxSuppression calls tf.image.non_max_suppression_padded() in image_ops_impl.py. This calls non_max_suppression_padded_v2 in the same file, which proceeds to conduct NMS in pure Python. This is in contrast to MultiClassNonMaxSuppression, which calls tf.image.combined_non_max_suppression () in image_ops_impl.py. This proceeds to call gen_image_ops.combined_non_max_suppression(), which I believe runs the NMS in C++. This makes makes the 10x speed up possible. Anyone have any idea why this was done? For reference here is tf.image.non_max_suppression_padded():
Note that non-max suppression is done totally manually. Interestingly enough, non_max_suppression_v1 did refer to a C++ implementation, only v2 does it in Python. And here is tf.image.combined_non_max_suppression():
Anyone know why regular non max suppression is being done in python? This issue seems to imply that the solution will have to come from updating tensorflow, not keras-cv, let me know if I am wrong on this. |
And that is still the issue. :) |
After upgrading
keras-cv
to v0.6.1 I noticed that predict method of RetinaNet model became really slow comparing with v0.5.1 as result it complicates COCO metrics evaluation.When gettting predictions for a single image in 0.6.1:
1/1 [==============================] - 42s 42s/step
And in 0.5.1:
1/1 [==============================] - 6s 6s/step
Another problem with predictions is that this function throws an exception when passing a generator as an argument. And again, in 0.5.1 it worked perfectly.
I'd stick with the older version until the issue is resolved, but unfortuantely COCO metrics seem to be broken in 0.5.1 as was reported in the following issue #1994
Colab for reproducing the issue: https://colab.research.google.com/drive/1dzJFiVIxXtJCoj-ShjRyu-ZPkf6ClCdj?usp=sharing
The text was updated successfully, but these errors were encountered: