Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpRegularizerManager could not handle ops #111

Open
mengdong opened this issue Sep 4, 2019 · 7 comments
Open

OpRegularizerManager could not handle ops #111

mengdong opened this issue Sep 4, 2019 · 7 comments

Comments

@mengdong
Copy link

mengdong commented Sep 4, 2019

Hello,

I have tried a few examples from tensorflow/model with morphnet (lenet and resnet), a simple mnist model (https://github.com/mengdong/morph-net/blob/master/morph_net/examples/mnist/mnist-tutorial.py) works. However, I ran into problems in some other more complex models under tensorflow estimator interface.

I wonder is there a recommended way to use morphnet in tf estimator inferface? I know there is quite some overhead in the estimator's graph. Detailed infromation below:

Regarding lenet (https://github.com/mengdong/morph-net/blob/master/morph_net/examples/mnist/mnist.py) from https://github.com/tensorflow/models/tree/master/official/mnist, I observe that:

    I0904 13:14:27.240477 140031449261888 op_regularizer_manager.py:125] 
    OpRegularizerManager found 63 ops and 4 sources.
    ......
    File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/framework/op_regularizer_manager.py", line 137, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
    RuntimeError: OpRegularizerManager could not handle ops: ['sequential/conv2d/BiasAdd (BiasAdd)', 'sequential/max_pooling2d_1/MaxPool (MaxPool)', 'sequential/conv2d_1/BiasAdd (BiasAdd)', 'sequential/max_pooling2d/MaxPool (MaxPool)', 'sequential/conv2d/BiasAdd/ReadVariableOp (ReadVariableOp)'] 

Regarding ResNet (https://github.com/mengdong/morph-net/blob/master/morph_net/examples/resnet/imagenet_main.py), I observe:

    I0904 11:27:34.397989 139699288442688 op_regularizer_manager.py:125] 
    OpRegularizerManager found 629 ops and 53 sources.
    .....
    RuntimeError: OpRegularizerManager could not handle ops: 
    ['resnet_model/batch_normalization_45/FusedBatchNormV3 (FusedBatchNormV3)', 
    'resnet_model/Pad_6 (Pad)', 'resnet_model/batch_normalization_44/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_49/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_48/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_47/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_52/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_51/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Squeeze (Squeeze)', 'resnet_model/final_reduce_mean (Identity)', 'resnet_model/Mean (Mean)', 'resnet_model/batch_normalization_50/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_43/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_24/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_11/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_1/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad (Pad)', 
    'resnet_model/batch_normalization/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_4/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_3/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/max_pooling2d/MaxPool (MaxPool)', 'resnet_model/initial_max_pool (Identity)', 'resnet_model/batch_normalization_2/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_7/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_6/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_5/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_10/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_9/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_8/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_14/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_13/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_2 (Pad)', 'resnet_model/batch_normalization_12/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_17/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_16/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_15/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_20/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_19/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_18/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_23/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_22/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_21/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_27/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_26/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_4 (Pad)', 
    'resnet_model/batch_normalization_25/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_30/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_29/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_28/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_33/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_32/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_31/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_36/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_35/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_34/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_39/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_38/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_37/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_42/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_41/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_40/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_46/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_45/ReadVariableOp (ReadVariableOp)', 'resnet_model/batch_normalization_45/ReadVariableOp_1 (ReadVariableOp)']
@eladeban
Copy link
Contributor

eladeban commented Sep 5, 2019

@mengdong Thanks for rising this issue.

We routinely work with ResNet and other complicated models so I don't think that the complexity is the issue.

Are you using the encoding where channels is in dim=3?

Could you recover this behavior with a single ResNet unit and print the entire trace?

@mengdong
Copy link
Author

mengdong commented Sep 9, 2019

Hello @eladeban,

Thanks for the prompt response. I don't think complexity is the issue, as lenet also have similar error. I suspect the additional node/ops created by tensorflow estimator interface. I will try with tensorflow slim to see how it works seems like you more success on tensorflow slim.

@eladeban
Copy link
Contributor

took another look. it might be reduce_mean in line 542.
can you apply the regularizer to take inputs prior to that?

qq: are you using channels_first?

@mengdong
Copy link
Author

Sorry for the late reply, yes, I am using channels_first. Let me modify the regularizer and give it a try

@mengdong
Copy link
Author

mengdong commented Sep 11, 2019

Hello, thank you for looking into the code. I have tried to modify the output_boundary to:

name: "resnet_model/block_layer4"
op: "Identity"
input: "resnet_model/Relu_48"
device: "/replica:0/task:0/device:GPU:0"
attr {
  key: "T"
  value {
    type: DT_FLOAT
  }
}

The entire trace is here:

I0910 17:29:56.384171 139780389660480 op_regularizer_manager.py:122] OpRegularizerManager starting analysis from: [<tf.Operation 'resnet_model/block_layer4' type=Identity>].
I0910 17:29:56.385807 139780389660480 op_regularizer_manager.py:125] OpRegularizerManager found 618 ops and 53 sources.
Traceback (most recent call last):
  File "imagenet_main.py", line 391, in <module>
    absl_app.run(main)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "imagenet_main.py", line 385, in main
    run_imagenet(flags.FLAGS)
  File "imagenet_main.py", line 378, in run_imagenet
    shape=[DEFAULT_IMAGE_SIZE, DEFAULT_IMAGE_SIZE, NUM_CHANNELS])
  File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/examples/resnet/resnet_run_loop.py", line 705, in resnet_main
    max_steps=flags_obj.max_train_steps)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1156, in _train_model
    return self._train_model_distributed(input_fn, hooks, saving_listeners)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_distributed
    self._config._train_distribute, input_fn, hooks, saving_listeners)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1299, in _actual_train_model_distributed
    self.config))
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1810, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_core/python/distribute/one_device_strategy.py", line 356, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "imagenet_main.py", line 347, in imagenet_model_fn
    label_smoothing=flags.FLAGS.label_smoothing
  File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/examples/resnet/resnet_run_loop.py", line 398, in resnet_model_fn
    gamma_threshold=1e-3
  File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/network_regularizers/flop_regularizer.py", line 72, in __init__
    regularizer_blacklist=regularizer_blacklist)
  File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/framework/op_regularizer_manager.py", line 137, in __init__
    ['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['resnet_model/batch_normalization_31/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_36/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_35/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_34/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_39/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_38/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_37/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_42/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_41/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_40/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_46/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_45/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_6 (Pad)', 'resnet_model/batch_normalization_44/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_49/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_48/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_47/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_52/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_51/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_50/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_43/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_24/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_11/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_1/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad (Pad)', 'resnet_model/batch_normalization/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_4/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_3/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/max_pooling2d/MaxPool (MaxPool)', 'resnet_model/initial_max_pool (Identity)', 'resnet_model/batch_normalization_2/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_7/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_6/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_5/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_10/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_9/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_8/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_14/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_13/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_2 (Pad)', 'resnet_model/batch_normalization_12/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_17/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_16/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_15/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_20/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_19/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_18/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_23/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_22/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_21/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_27/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_26/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_4 (Pad)', 'resnet_model/batch_normalization_25/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_30/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_29/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_28/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_33/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_32/FusedBatchNormV3 (FusedBatchNormV3)']

@eladeban
Copy link
Contributor

channels_first is the problem. We assume channels_last...
Note that you need to use channels_last only during structure leanring, later you could revert back to (faster?) channel_first.

@mengdong
Copy link
Author

I see. Let try this again. Thanks for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants