Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF 2.X & Keras 2.3.1 compatibility #2278

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

IgnacioAmat
Copy link

No description provided.

@VictorAtPL
Copy link

@IgnacioAmat
Are you able to train this model in Tensorflow 2.2.0 or still struggling with some incompatibilities?

@IgnacioAmat
Copy link
Author

@VictorAtPL With the changes I proposed I was able to run the training without any incompatibilities, the model was well trained and showed good results

@VictorAtPL
Copy link

@IgnacioAmat
Ok, is the training done in eager mode, or the graph is compilated?
If eventually I will be able to train on Tensorflow 2.2 I will have to extend the model with new heads. What I will need is a great debugging possibility and in Tensorflow 2.x it's easier than in Tensorflow 1.x, isn't it?

@IgnacioAmat
Copy link
Author

@VictorAtPL
As Tensorflow 2.X has eager mode by default I just trained the model with eager mode enabled.
Yes, Tensorflow 2.X offers way better debugging possibilities than Tensorflow 1.X, it allows debugging normal python code for example using pdb.

@VictorAtPL
Copy link

VictorAtPL commented Jul 20, 2020

@IgnacioAmat
Okay, but did you use pdb to ensure that it's really in eager mode and you saw EagerTensors with values within them, or you just suppose it should be eager and nothing in Keras is graph by default?

@IgnacioAmat
Copy link
Author

@VictorAtPL
Oh sorry, there was a misunderstanding from my part ! No I actually executed the code based on graph definition, the Keras mode. I supposed it was eager mode as it was enabled by default, but didn't modify the code to actually execute it in eager mode.

@ivanlen
Copy link

ivanlen commented Aug 10, 2020

I am having some issues with this PR.
Have you checked PR #2115?
Using both PRs together I am able to run everything on TF 2.x and keras 2.3.1.

@IgnacioAmat IgnacioAmat changed the title TF 2.2.0 & Keras 2.3.1 compatibility TF 2.X & Keras 2.3.1 compatibility Aug 31, 2020
@IgnacioAmat
Copy link
Author

You were right @ivanlen, I forgot to add those changes too for the compatibility of TF 2.X. Thanks for the remark !

@RishikMani
Copy link

You were right @ivanlen, I forgot to add those changes too for the compatibility of TF 2.X. Thanks for the remark !

Thank you very much. I was finally able to execute the code. Could you also let me know, were you able to start the training? As of now I have 350 training images and 50 validation images of size 256x256 with 4GB NVIDIA GTX 960. But it could not train and throws ran out of memory exception.

@IgnacioAmat
Copy link
Author

Hi @RishikMani, you have to reduce your batch size to prevent running out of memory or reduce image size to lower resolution.

@ivanlen
Copy link

ivanlen commented Sep 2, 2020

Hey @RishikMani Ideally you can use a generator, check out:

https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence

@dsalnikov
Copy link

Hi, I get an error when trying to start training with this pr:

ValueError: The following Variables were created within a Lambda layer (anchors) but are not tracked by said layer: <tf.Variable 'anchors/Variable:0' shape=(1, 36720, 4) dtype=float32> The layer cannot safely ensure proper Variable reuse across multiple calls, and consquently this behavior is disallowed for safety. Lambda layers are not well suited to stateful computation; instead, writing a subclassed Layer is the recommend way to define layers with Variables.

How can I fix this?

@Shakesbeer333
Copy link

@dsalnikov there is a solution in the issue section

@kimile599
Copy link

Hi @RishikMani, you have to reduce your batch size to prevent running out of memory or reduce image size to lower resolution.

Hi bro, i am using the mask r cnn and lowered the loss to 0.1 sometimes 0.09, but the bal_loss is not converging? any suggestions on that?

@Shakesbeer333
Copy link

Is this working with CUDA 10.2?

@IgnacioAmat
Copy link
Author

Yes @Shakesbeer333

@IgnacioAmat
Copy link
Author

IgnacioAmat commented Sep 16, 2020

Hi @kimile599, your problem may be that during training you are overfitting your data. Have you tried to increase you dataset size or using data augmentation ? Check these issues if to see if they can help you with your problem #281 and #527

@kimile599
Copy link

Hi @kimile599, your problem may be that during training you are overfitting your data. Have you tried to increase you dataset size or using data augmentation ? Check these issues if to see if they can help you with your problem #281 and #527

Thank you for your reply. I am now doing the augmentation and try to flat the loss.

@jvdavim
Copy link

jvdavim commented Sep 24, 2020

What about updating requirements.txt?

@IgnacioAmat
Copy link
Author

@jvdavim Are you thinking about upgrading the minimal versions of both TF and keras to be something like: tensorflow>=2.0 keras>=2.3.1?

@jvdavim
Copy link

jvdavim commented Sep 25, 2020

Yes. But I just realized that there is no difference. It will install the latest version anyway.

@jvdavim
Copy link

jvdavim commented Sep 28, 2020

I got this error.
AttributeError: module 'tensorflow.python.framework.ops' has no attribute '_TensorLike'

tensorflow==2.3.1
keras==2.3.1

@IgnacioAmat
Copy link
Author

Please check the comment of @NMazzatenta in this issue

@sirbastiano
Copy link

@VictorAtPL With the changes I proposed I was able to run the training without any incompatibilities, the model was well trained and showed good results
Got this error running in Colab
ValueError:
The following Variables were created within a Lambda layer (anchors)
but are not tracked by said layer:
<tf.Variable 'anchors/Variable:0' shape=(8, 4092, 4) dtype=float32>
The layer cannot safely ensure proper Variable reuse across multiple
calls, and consquently this behavior is disallowed for safety. Lambda
layers are not well suited to stateful computation; instead, writing a
subclassed Layer is the recommend way to define layers with
Variables.

@sirbastiano
Copy link

Got this error:
ValueError:
The following Variables were created within a Lambda layer (anchors)
but are not tracked by said layer:
<tf.Variable 'anchors/Variable:0' shape=(8, 4092, 4) dtype=float32>
The layer cannot safely ensure proper Variable reuse across multiple
calls, and consquently this behavior is disallowed for safety. Lambda
layers are not well suited to stateful computation; instead, writing a
subclassed Layer is the recommend way to define layers with
Variables.

@IgnacioAmat
Copy link
Author

@sirbastiano you seem to be having the same problem as @dsalnikov. You should maybe check this issue to see if it helps, but as @Shakesbeer333 said, the solution to this is on the issue section

@sirbastiano
Copy link

sirbastiano commented Oct 5, 2020 via email

@IgnacioAmat
Copy link
Author

No, sorry @sirbastiano. Maybe @dsalnikov can give you a hint on how he managed to solve that problem.

@sirbastiano
Copy link

sirbastiano commented Oct 5, 2020 via email

@IgnacioAmat
Copy link
Author

IgnacioAmat commented Oct 5, 2020

You have to delete that parameter from model.py file as it is enabled by default in config.py as you can check in this issue

@sirbastiano
Copy link

sirbastiano commented Oct 5, 2020 via email

@sirbastiano
Copy link

sirbastiano commented Oct 8, 2020 via email

@Ademord
Copy link

Ademord commented Oct 8, 2020

My basic test was to run the /samples notebooks.
First one: demo, doesn't run.
The other ones have a lot of errors after mini masks section and data_generator.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-aa019bc2cbcd> in <module>
     16 sys.path.append(ROOT_DIR)  # To find local version of the library
     17 from mrcnn import utils
---> 18 import mrcnn.model as modellib
     19 from mrcnn import visualize
     20 # Import COCO config

~/src/maskrcnn-amatt/mrcnn/model.py in <module>
     18 import numpy as np
     19 import tensorflow as tf
---> 20 import keras
     21 import keras.backend as K
     22 import keras.layers as KL

ModuleNotFoundError: No module named 'keras'

@IgnacioAmat
Copy link
Author

error: 'Dataset' object non subscriptable

Hi @sirbastiano, what I think you are trying to do while getting this error is to index an object that doesn't have that functionality. Check that your Dataset object can be subscriptable in order to avoid this problem. Can you provide some more code and error traceback ?

@sirbastiano
Copy link

sirbastiano commented Oct 14, 2020 via email

@IgnacioAmat
Copy link
Author

My basic test was to run the /samples notebooks.
First one: demo, doesn't run.
The other ones have a lot of errors after mini masks section and data_generator.

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-aa019bc2cbcd> in <module>
     16 sys.path.append(ROOT_DIR)  # To find local version of the library
     17 from mrcnn import utils
---> 18 import mrcnn.model as modellib
     19 from mrcnn import visualize
     20 # Import COCO config

~/src/maskrcnn-amatt/mrcnn/model.py in <module>
     18 import numpy as np
     19 import tensorflow as tf
---> 20 import keras
     21 import keras.backend as K
     22 import keras.layers as KL

ModuleNotFoundError: No module named 'keras'

Hi @Ademord, this error means that in your current environment you don't have the keras module installed. Install it with conda conda install -c conda-forge keras or with pip pip install Keras

@wiktor-jurek
Copy link

I think I may have found another incompatibility in the utils module. When calling utils.compute_ap(), that ends up performing an np.dot() calculation, but the two masks given have a shape mismatch. I've looked through the Utils code, but I can't find a reason as to why downgrading TF results in a successful operation. This is briefly mentioned in #960 , and the suggested advice is to downgrade TF.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-1731946196d5> in <module>
     22     #precision_, recall_, AP_
     23     AP_, precision_, recall_, overlap_ = utils.compute_ap(gt_bbox, gt_class_id, gt_mask,
---> 24                                           r['rois'], r['class_ids'], r['scores'], r['masks'])
     25     #check if the vectors len are equal
     26     print("the actual len of the gt vect is : ", len(gt_tot))

~/project/2_MaskRCNN/mrcnn/utils.py in compute_ap(gt_boxes, gt_class_ids, gt_masks, pred_boxes, pred_class_ids, pred_scores, pred_masks, iou_threshold)
    716         gt_boxes, gt_class_ids, gt_masks,
    717         pred_boxes, pred_class_ids, pred_scores, pred_masks,
--> 718         iou_threshold)
    719 
    720     # Compute precision and recall at each prediction box step

~/project/2_MaskRCNN/mrcnn/utils.py in compute_matches(gt_boxes, gt_class_ids, gt_masks, pred_boxes, pred_class_ids, pred_scores, pred_masks, iou_threshold, score_threshold)
    669 
    670     # Compute IoU overlaps [pred_masks, gt_masks]
--> 671     overlaps = compute_overlaps_masks(pred_masks, gt_masks)
    672 
    673     # Loop through predictions and find matching ground truth boxes

~/project/2_MaskRCNN/mrcnn/utils.py in compute_overlaps_masks(masks1, masks2)
    118 
    119     # intersections and union
--> 120     intersections = np.dot(masks1.T, masks2)
    121     union = area1[:, None] + area2[None, :] - intersections
    122     overlaps = intersections / union

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (2,65536) and (3136,43) not aligned: 65536 (dim 1) != 3136 (dim 0)

@Ademord
Copy link

Ademord commented Mar 24, 2021

If anyone was looking for an alternative to this repo I moved to Detectron2 and saw even better performance in terms of speed and accuracy, so I would recommend it.

@guilhermemarim
Copy link

Hi everyone! I'm using the TF 2.1.0 and keras 2.3.1 and I got this error:

AttributeError: module 'tensorflow' has no attribute 'random_shuffle'

May someone help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.