Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vectorized base image augmentation layer w/ Grayscale() example #1331

Closed
wants to merge 19 commits into from

Conversation

LukeWood
Copy link
Contributor

@LukeWood LukeWood commented Jan 30, 2023

We have a ton of performance issues with KPLs due to auto vectorization not working with tf.image ops.

Basically my proposed approach is as follows:

  • replace KPLs that can be vectorized fully with VectorizedBaseImageAugmentation subclasses

Fixes: #581


Caveats of this approach:

  • a bit less flexibility (i.e. in the case image augmentation could be vectorized, but bounding box augmentation could be not)
  • loss of ragged support
  • perhaps a bit convoluted implementation wise to have two base layers

@bhack
Copy link
Contributor

bhack commented Jan 30, 2023

What about the product decision at tensorflow/tensorflow#55639 (comment)?

@LukeWood
Copy link
Contributor Author

What about the product decision at tensorflow/tensorflow#55639 (comment)?

Curious @fchollet 's thoughts on this. Its impossible to vectorize this one in TensorFlow and keep transformations independent from one another. Do you think a slowdown is acceptable if this is the case?

@bhack
Copy link
Contributor

bhack commented Jan 30, 2023

The bottleneck is with any API that doesn't have a batch dimension on param that you want to randomize or change x element.

So it is like if you need to loop over the batch calling the same op with a different parameter.

I don't know if this could happen or not in the near future with the new XLA/MLIR birdge design and its genenerated kernels but right now not with vectorized map and I think not with the current bridge.

Edit (our old thread about tf.image.* lowerings`):
https://discuss.tensorflow.org/t/lowering-for-tf-raw-ops-imageprojectivetransform/8462

@bhack
Copy link
Contributor

bhack commented Jan 31, 2023

tensorflow/tensorflow#55335 (comment)

@jpienaar Can ops like this one be fused in a loop without breaking the raw ops API?

@LukeWood LukeWood changed the title (Draft) Implement vectorized base image augmentation layer Implement vectorized base image augmentation layer w/ Grayscale() example Feb 7, 2023
@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 7, 2023

@jbischof @ianstenbit this is ready for a real review now. We can incrementally migrate layers to vectorized layers using this approach.

@bhack
Copy link
Contributor

bhack commented Feb 7, 2023

Before we merge I hope that we could clarify the point about the product decision at:

#1256 (comment)

As It was since the first keras-cv commits that I was talking about the performance overhead of this design with tf.image API.

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 7, 2023

Before we merge I hope that we could clarify the point about the product decision at:

#1256 (comment)

As It was since the first keras-cv commits that I was talking about the performance overhead of this design with tf.image API.

What are we hoping to have clarified? This change solves the performance issues of grayscale with no regression of functionality. Item-wise augmentation is still deemed important.

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 7, 2023

/gcbrun

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 7, 2023

/gcbrun

@bhack
Copy link
Contributor

bhack commented Feb 7, 2023

What are we hoping to have clarified? This change solves the performance issues of grayscale with no regression of

It is not only about grayscale as this is related to many ops. An August not update list is at #291 (comment)

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 8, 2023

What are we hoping to have clarified? This change solves the performance issues of grayscale with no regression of

It is not only about grayscale as this is related to many ops. An August not update list is at #291 (comment)

My goal in this PR is to provide an abstraction to allow us to optimize the performance of layers that are not being converted by pfor: #291 (comment)

Does that answer the question?

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 8, 2023

/gcbrun

@bhack
Copy link
Contributor

bhack commented Feb 8, 2023

Does that answer the question?

This was the same topic one year ago but we didn't care about the topic also if we was in a design phase:
#455 (comment)

So what have changed in 1 year that instead we could not care in the design phase with all the explicit feedback and benchmarks that I've sent?

not being converted by pfor

As you can read in many place and also in that comment it is not the only point for the failure/fallback of pfor conversion as we have also missing converters.

As an extra note:
after all these months it seems that we don't have tf.image API owners and probably also pfor API owner as all the related tickets are without any other comment or action.

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 8, 2023

Does that answer the question?

This was the same topic one year ago but we didn't care about the topic also if we was in a design phase: #455 (comment)

So what have changed in 1 year that instead we could not care in the design phase with all the explicit feedback and benchmarks that I've sent?

not being converted by pfor

As you can read in many place and also in that comment it is not the only point for the failure/fallback of pfor conversion as we have also missing converters.

As an extra note: after all these months it seems that we don't have tf.image API owners and probably also pfor API owner as all the related tickets are without any other comment or action.

we do care about performance, and as you say there is no tf.image API owner or pfor API owner, so instead of for these issues to be fixed we'd like to offer vectorized implementations of these layers in the meantime. Does that answer your question?

And to be clear, it was never that we didn't care about performance, it was simply that getting RandAugment working, OD augmentations working, and other product goals took priority. Now that these are working I'm attempting to revisit these layers and optimize without major regressions in functionality.

@bhack
Copy link
Contributor

bhack commented Feb 8, 2023

it was never that we didn't care about performance

Honestly the impression for one year was that we did not care about performance. I have not collected any comment that we recognized the problem also if it was submitted and benchmarked in multiple tickets and that we just need to postpone the design for resource constrain.

In any case better later then never, we will see the residual fallback after this migration.

About the PR:
Do we need to subclass tf.keras.__internal__.layers.BaseRandomLayer for every new policy?

Copy link
Contributor

@ianstenbit ianstenbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luke!

Lots of comments because this is a foundational piece and I want to make sure we get it really right.

inputs = {IMAGES: inputs}

metadata[BATCHED] = inputs["images"].shape.rank == 4
if inputs["images"].shape.rank == 3:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianstenbit see here, we expand dims if its operation in unbatched mode.

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 8, 2023

@ianstenbit all comments addressed.

Closed in favor of #1373

@LukeWood LukeWood closed this Feb 8, 2023
@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 8, 2023

it was never that we didn't care about performance

Honestly the impression for one year was that we did not care about performance. I have not collected any comment that we recognized the problem also if it was submitted and benchmarked in multiple tickets and that we just need to postpone the design for resource constrain.

In any case better later then never, we will see the residual fallback after this migration.

About the PR: Do we need to subclass tf.keras.__internal__.layers.BaseRandomLayer for every new policy?

Honestly the impression for one year was that we did not care about performance. I have not collected any comment that we recognized the problem also if it was submitted and benchmarked in multiple tickets and that we just need to postpone the design for resource constrain.

Resource constraints are real and we were hoping pfor would either be:

  • fixed
  • sufficiently fast

Neither turned out so we are now prioritizing this. Please keep in mind resource constraints are a real issue, and they don't mean issues aren't important; they simply mean we have other things that are more pressing at the time.

@bhack
Copy link
Contributor

bhack commented Feb 8, 2023

Neither turned out so we are now prioritizing this. Please keep in mind resource constraints are a real issue, and they don't mean issues aren't important; they simply mean we have other things that are more pressing at the time.

Generally I agree but it is a little bit different when we was still in the design phase and we had realtime feedback about how it will go to fail.
If not it will be hard to interact clearly on the same page (mainly on PR) if we give priority/attention to f2f meeting.

Pfor could be not fixed if the underline design API of the ops we want to use in this library is not batch ready.

I think our approach was really out of design for Pfor from the early days.

Then for missing converters it could be still fixed.

We cared about something (within the batch policy) with some theoretical validity but without an empirical verification about the faster convergence vs x epoch computational overhead.

Also as we have seen in another recent PR pytorch vision is not working within the batch so many reference models that we also train/replicare and the related reference papers were not trained with the within the batch policy at all.

@LukeWood
Copy link
Contributor Author

LukeWood commented Feb 9, 2023

We cared about something (within the batch policy) with some theoretical validity but without an empirical verification about the faster convergence vs x epoch computational overhead.

This already matters in contrastive learning and any case where samples are compared within a batch. This is a case where a silent error could really waste a users time someday - and we should avoid this if possible.

In the current structure we will likely be able to have both performance and numerical correctness. I think this is the best of all worlds

@LukeWood LukeWood deleted the kplperf branch February 9, 2023 04:18
@bhack
Copy link
Contributor

bhack commented Feb 9, 2023

This already matters in contrastive learning and any case where samples are compared within a batch. This is a case where a silent error could really waste a users time someday - and we should avoid this if possible.

In the current structure we will likely be able to have both performance and numerical correctness. I think this is the best of all worlds

Yes but:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: RandomContrast slows down training 6-fold
3 participants