Skip to content

Error in tinyyolo conversion from Coreml to Caffe (Mxnet) Different way of padding

Jiahao Yao edited this page May 10, 2018 · 6 revisions

Error in tinyyolo conversion from Coreml to Caffe Mxnet Different way of padding

Model: Tiny YOLO network from the paper 'YOLO9000: Better, Faster, Stronger' (2016), arXiv:1612.08242

Source: COREML

Destination: Caffe / Mxnet

Author: Jiahao


Why we find this problem

We test the coreml parser and caffe / mxnet Emitter, using the same weights in every layer.

The outputs of coreml model have the shape of 13x13x125.

Nevertheless, the outputs of caffe / mxnet model have the shape of 12x14x125 or 14x14x125 or 14x12x125.

It's a big error although the converted model can also run smoothly and the code seems to be error-proof.

Convert coreml model code and Mxnet model code and check the code

First, download the coreml tinyyolo model.

$ mmdownload -f coreml -n tinyyolo 

Secondly, convert the coreml model to IR structure.

$ mmtoir -f coreml -d tinyyolo -n TinyYOLO.mlmodel --dstNodeName MMdnn_Output

You will get

IR network structure is saved as [tinyyolo.json].
IR network structure is saved as [tinyyolo.pb].
IR weights are saved as [tinyyolo.npy].

Finally, convert the IR to mxnet code.

$ mmtocode -f mxnet --IRModelPath tinyyolo.pb --IRWeightPath tinyyolo.npy --dstModelPath mx_tinyyolo.py --dstWeightPath mx_tinyyolo-0000.param

Then, the Mxnet network code snippet is saved as [mx_tinyyolo.py].

In the line 33 of mx_tinyyolo.py is maxpooling with stride 2

    maxpooling2d_6  = mx.sym.Pooling(data = leakyrelu_6, global_pool = False, kernel=(2L, 2L), pool_type = 'max', stride=(1L, 1L), pad=(0L, 0L), name = 'maxpooling2d_6')

Originally in the coreml model, the input of this layer has shape of 13x13, and the output of this layer is supposed to be 13x13. To be more specific, the padding of this layer is supposed to be padding_left=1, padding_right=0, padding_top=1, padding_bottom=0. However, in caffe and coreml model, the padding has to be symmetric, which means padding_left has to be equal to padding_right and padding_top has to be equal to padding_bottom. Therefore, the output of this layer can never be 13x13, but 12x14 or 14x12 or 14x14, which can be seen more clearly from the images below.

the reason of the inconsistent shapes is due to symmetric padding in mxnet/ caffe

Different way of padding results in different shape after pooling.

Since the paddings in caffe and mxnet are symmetric, the shapes after this pool layer (kernel size = 2) are even number (12 or 14), not odd (13).

Possible solutions to this problem can be either adding padding layer before pooling layer when converting to mxnet, or crop the image in order to match the supposed output shape of the pooling layer.

  • Mxnet framework problem is solved by adding padding layer before pooling layer. (2018.5.10)