-
Notifications
You must be signed in to change notification settings - Fork 279
Add DeepLabV3Plus segmentation #1799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sachinprasadhs
commented
Aug 27, 2024
- Added segmentation Base class with the compile method.
- Added a new testcase for segmentarion task
- The default output for the segmentation task will now be logits, which allows users to apply activation specific to usecase.
* Agg Vgg16 backbone * update names * update tests * update test * add image classifier * incorporate review comments * Update test case * update backbone test * add image classifier * classifier cleanup * code reformat * add vgg16 image classifier * make vgg generic * update doc string * update docstring * add classifier test * update tests * update docstring * address review comments * code reformat * update the configs * address review comments * fix task saved model test * update init * code reformatted
* Add ResNetV1 and ResNetV2 * Address comments
* Add CSP DarkNet * Add CSP DarkNet * snake_case function names * change use_depthwise to block_type
…Backbone` (keras-team#1769) * Add FeaturePyramidBackbone and update ResNetBackbone * Simplify the implementation * Fix CI * Make ResNetBackbone compatible with timm and add FeaturePyramidBackbone * Add conversion implementation * Update docstrings * Address comments
* Add DenseNet * fix testcase * address comments * nit * fix lint errors * move description
* add vit det vit_det_backbone * update docstring * code reformat * fix tests * address review comments * bump year on all files * address review comments * rename backbone * fix tests * change back to ViT * address review comments * update image shape
* Add MixTransformer * fix testcase * test changes and comments * lint fix * update config list * modify testcase for 2 layers
* update input_image_shape -> image_shape * update docstring example * code reformat * update tests
add missing __init__ file to vit_det
This is a temporary way to test out the keras-hub branch. - Does a global rename of all symbols during package build. - Registers the "old" name on symbol export for saving compat. - Adds a github action to publish every commit to keras-hub as a new package. - Removes our descriptions on PyPI temporarily, until we want to message this more broadly.
* Add `CLIPTokenizer`, `T5XXLTokenizer`, `CLIPTextEncoder` and `T5XXLTextEncoder`. * Make CLIPTextEncoder as Backbone * Add `T5XXLPreprocessor` and remove `T5XXLTokenizer` Add `CLIPPreprocessor` * Use `tf = None` at the top * Replace manual implementation of `CLIPAttention` with `MultiHeadAttention`
keras_nlp/src/models/deeplab_v3_plus/deeplab_v3_plus_segmenter.py
Outdated
Show resolved
Hide resolved
* Bounding box utils * - Correct test cases * - Remove hard tensorflow dtype * - fix api gen * - Fix import for test cases - Use setup for converters test case * - fix api_gen issue * - FIx api gen * - Fix api gen error * - Correct test cases as per new api changes
* mobilenet_v3 added in keras-nlp * minor bug fixed in mobilenet_v3_backbone * formatting corrected * refactoring backbone * correct_pad_downsample method added * refactoring backbone * parameters updated * Testcaseupdated, expected output shape corrected * code formatted with black * testcase updated * refactoring and description added * comments updated * added mobilenet v1 and v2 * merge conflict resolved * version arg removed, and config options added * input_shape changed to image_shape in arg * config updated * input shape corrected * comments resolved * activation function format changed * minor bug fixed * minor bug fixed * added vision_backbone_test * channel_first bug resolved * channel_first cases working * comments resolved * formatting fixed * refactoring --------- Co-authored-by: ushareng <usha.rengaraju@gmail.com>
* migrating efficientnet models to keras-hub * merging changes from other sources * autoformatting pass * initial consolidation of efficientnet_backbone * most updates and removing separate implementation * cleanup, autoformatting, keras generalization * removed layer examples outside of effiicient net * many, mainly documentation changes, small test fixes
* Add ResNet_vd to ResNet backbone * Addressed requested parameter changes * Fixed tests and updated comments * Added new parameters to docstring
* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`
* add pyramid outputs * fix testcase * format fix * make common testcase for pyramid outputs * change default shape * simplify testcase * test case change and add channel axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Overall this looks good. Left some small comments and one big comment re class design.
keras_nlp/src/models/deeplab_v3_plus/deeplab_v3_plus_segmenter.py
Outdated
Show resolved
Hide resolved
dilation_rates: (Optional) A `list` of integers for parallel dilated conv. | ||
Applied only when Default `SpatialPyramidPooling` is used. Usually a | ||
sample choice of rates are [6, 12, 18]. | ||
segmentation_head: (Optional) a `keras.layers.Layer`. If provided, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually don't want to do this. We are somewhat messing with the definition of a CV backbone, but in general we have a pattern where...
- A backbone contains all the weights from a pretrained model.
- A task wraps the backbone for a specific task.
So with that approach, I think what we probably need to do here...
- Create a new
DeeplabV3PlusBackbone
class which has almost all the arguments here. Rename thebackbone
argument it takes in to afeature_pyramid_model
or something like that. - The
ImageSegmenter
wraps this backbone with a head (that is not configurable!). The arguments that this class takes should be the same as our classification tasks,backbone
,num_classes
, andactivation
. - Anyone wanting a custom
segmentation_head
in the current design would instead create aDeeplabV3PlusBackbone
and attach their own head to it.
So basically, the new DeeplabV3PlusBackbone
will have weights from the pooling layers. On disk we will save something like this...
config.json # DeeplabV3PlusBackbone
model.weights # DeeplabV3PlusBackbone weights (feature pyramid + pooling weights)
task.json # DeeplabV3PlusImageSegmenter config
task.weights # Head weights only
We will need to add a way to allow loading either the segmentation head with it's 19 classes, or a new head with random weights, but that does not need to be on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made changes according to this
* Add `MMDiT` * Update * Update * Update implementation
* - Add formats, iou, utils for bounding box * - Add `AnchorGenerator`, `BoxMatcher` and `NonMaxSupression` layers * - Remove scope_name not required. * use default keras name scope * - Correct format error * - Remove layers as of now and keep them at model level till keras core supports them * - Correct api_gen
753047d
to
a5e5d8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I'd kill some of the configurability of the backbone to start simple.
Had some question on what we should stick in the task head vs the backbone (depends on what users will want to start with pretrained weights when doing their own segmentation).
https://arxiv.org/abs/1706.05587)(CVPR 2017) | ||
|
||
Args: | ||
image_encoder: `keras.Model`. The backbone network for the model that is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent args
if not isinstance(image_encoder, keras.Model): | ||
raise ValueError( | ||
"Argument `image_encoder` must be a `keras.Model` instance. Received instead " | ||
f"backbone={image_encoder} (of type {type(image_encoder)})." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove backbone=
, in this case we don't even no what got passed in
data_format = keras.config.image_data_format() | ||
channel_axis = -1 if data_format == "channels_last" else 1 | ||
# === Functional Model === | ||
inputs = keras.layers.Input((None, None, 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are taking in image_size
generally, and defaulting it (None, None, 3)
, primarily so that users can pass a different number of channels.
|
||
fpn_outputs = fpn_model(inputs) | ||
|
||
if spatial_pyramid_pooling is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there particular reasons to do different pooling for deeplabv3 models in particular? If not, let's leave this argument out for now, and see if anyone asks for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atrous pooling is what makes the main difference in DeepLab architecture and it is the standard way of doing.
Only thing users might provide their own pooling might be by making variations of ASPP.
We can make it as default and remove it from config.
if isinstance(upsampling_size, tuple) | ||
else upsampling_size | ||
) | ||
if segmentation_head is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the segmentation head is exactly what lives in the DeepLabV3ImageSegmenter
class. That way we can kill this argument. If you want a custom head, just instantiate the backbone and add your own with the functional api. WDYT?
I guess one sub-question here. If you were training from a pretrained weights, but with a set of segmentation classes than the pretrained mode, would you generally want these head weights randomly initialized, or is started with the pretrained weights important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do choose to leave this in the backbone, I would probably leave it unconfigurable and see anyone asks. We'd probably be better off calling it an "upscaler" as it currently stands, but the simplest options is just killing the arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move it to DeepLabV3ImageSegmenter, user has to provide segmentation head weights as well if the num_classes is provided, that would make it more complicated I think.
or is started with the pretrained weights important?
It is better to have the pretrained weights.
I think as you suggested, it is better to get rid of the argument.
class ImageSegmenter(Task): | ||
"""Base class for all segmentation tasks. | ||
|
||
`Segmenter` tasks wrap a `keras_nlp.models.Backbone` to create a model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageSegmenter
model = keras_hub.models.DeepLabV3ImageSegmenter( | ||
num_classes=3, | ||
projection_filters=48, | ||
low_level_feature_key="P2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these aren't valid args right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will correct it.
```python | ||
images = np.ones(shape=(1, 96, 96, 3)) | ||
labels = np.zeros(shape=(1, 96, 96, 1)) | ||
feature_pyramid_model = keras_nlp.models.DeepLabV3Backbone.from_preset("deeplabv3_resnet50") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should actually show the shorter version likely. directly create the segmenter from preset...
segmenter = keras_hub.models.DeepLabV3ImageSegmenter.from_preset(
"deeplabv3_resnet50",
)
segmenter.predict(images) # 19 class, pretrained classification head.
segmenter = keras_hub.models.DeepLabV3ImageSegmenter.from_preset(
"deeplabv3_resnet50",
num_classes=2,
)
segmenter.fit(images, labels, epochs=3)
segmenter.predict(images) # Trained 2 class segmentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this makes more sense.
) | ||
|
||
def test_classifier_basics(self): | ||
pytest.skip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can just follow the same preprocessing == resizing that resnet does now. take a look
)(result[self.channel_axis]) | ||
|
||
result = ops.concatenate(result, axis=self.channel_axis) | ||
result = self.projection(result, training=training) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return immediately here
@sachinprasadhs you should update this to target master and do a rebase to the latest (post rename). Or you could just recreate the pr if you'd like. |
I will create a new PR. |
Sounds good! Let's close this one when you do. Thanks! |