MobileFaceNet training pipeline #214

nttstar · 2018-05-15T16:34:33Z

No description provided.

nttstar · 2018-05-16T15:02:03Z

My 2-stage pipeline:

Train softmax with lr=0.1 for 120K iterations.

LRSTEPS='240000,360000,440000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

Switch to ArcFace loss to do normal training with '100K,140K,160K' iterations.

LRSTEPS='100000,140000,160000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 1 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --pretrained '../models2/model-y1-test/model,70'

Pretrained model: baiduyun
training dataset: ms1m
LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91

tianxingyzxq · 2018-05-17T07:03:18Z

Can you share mobilenet v2 training pipeline?

AllenMas · 2018-05-17T08:35:16Z

what is the accuracy on LFW and AgeDB after trained by softmax, can you share the training log?

AleximusOrloff · 2018-05-23T01:29:41Z

Hi, can I ask in this thread?
Which type of filling have you used during network creation? xavier or something else?
I'm newbie to MXNet, trying to reproduce your result in Torch7

youyicloud · 2018-05-26T11:54:25Z

I used mxnet to calculate the cosine distance of the value of fc1 output, the output is wrong. The model is downloaded from the above Baidu cloud, and then the picture is used by two different men and women, has been aligned with the lfww mtcnn picture of.

`#coding=utf-8
import mxnet as mx
import numpy as np
import math
import cv2
from collections import namedtuple
from sklearn import preprocessing
Batch= namedtuple('Batch', ['data'])

image_size = (112,112)
batch_size = 2

def load_model(model_prefix):
sym, arg_params, aux_params = mx.model.load_checkpoint(model_prefix, 0)
all_layers = sym.get_internals()
sym = all_layers['fc1_output']
model = mx.mod.Module(symbol=sym, label_names = None)
model.bind(data_shapes=[('data', (2, 3, image_size[0], image_size[1]))])
model.set_params(arg_params, aux_params)
return model

def dis(x,y):
return np.dot(x, y)/np.linalg.norm(x)/np.linalg.norm(y)

def test(model_prefix):
img_path_1 = "./img_test/41.jpg"
img_path_2 = "./img_test/31.jpg"
model = load_model(model_prefix)
img1 = cv2.cvtColor(cv2.imread(img_path_1), cv2.COLOR_BGR2RGB)
img1 = cv2.resize(img1, (112, 112), interpolation=cv2.INTER_CUBIC)
img2 = cv2.cvtColor(cv2.imread(img_path_2), cv2.COLOR_BGR2RGB)
img2 = cv2.resize(img2, (112, 112), interpolation=cv2.INTER_CUBIC)
img1 = np.transpose(img1, axes=(2, 0, 1))
img2 = np.transpose(img2, axes=(2, 0, 1))
data_batch = []
data_batch.append(img1)
data_batch.append(img2)
data_batch = np.array(data_batch)
print(data_batch.shape)
print(img2.shape)
model.forward(Batch([mx.nd.array(data_batch)]))
prob = model.get_outputs()[0].asnumpy()
print(dis(prob[0],prob[1]))

model_prefix = "../../models/model"
test(model_prefix)`

#############Here are the output########

[00:19:53] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v1.0.0. Attempting to upgra de... [00:19:53] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded! (2, 3, 112, 112) (3, 112, 112) -0.9996472

could you tell me what have I missed? @nttstar

nttstar · 2018-05-26T13:30:11Z

why you thought the result was wrong?

youyicloud · 2018-05-26T16:10:21Z

Because when I use the two image which are from the same person，the output is -0.99964917，which is similar to the images from different people as I wrote above. How can I tell the two images from the same people or not? what is the threshold? 来自魅族手机

…

-------- 原始邮件 -------- 发件人：Jia Guo <notifications@github.com> 时间：周六 5月26日 21:30 收件人：deepinsight/insightface <insightface@noreply.github.com> 抄送：youyicloud <yanghy@youyicloud.com>,Comment <comment@noreply.github.com> 主题：Re: [deepinsight/insightface] MobileFaceNet training pipeline (#214)

why you thought the result was wrong? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread. ***@***.******@***.******@***.***":"ViewAction","target":"#214 (comment)","url":"https://github.com/deepinsight/insightface/issues/214#issuecomment-392261639","name":"View Issue"},"description":"View this Issue on ***@***.***":"Organization","name":"GitHub","url":"https://github.com"}} {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/deepinsight/insightface","title":"deepinsight/insightface","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in ***@***.*** in #214: why you thought the result was wrong?"}],"action":{"name":"View Issue","url":"#214 (comment)"}}} { ***@***.***": "MessageCard", ***@***.***": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "37567f93-e2a7-4e2a-ad37-a9160fc62647", "title": "Re: [deepinsight/insightface] MobileFaceNet training pipeline (#214)", "sections": [ { "text": "", "activityTitle": "**Jia Guo**", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": ***@***.***", "facts": [ ] } ], "potentialAction": [ { "name": "Add a comment", ***@***.***": "ActionCard", "inputs": [ { "isMultiLine": true, ***@***.***": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", ***@***.***": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueComment\",\n\"repositoryFullName\": \"deepinsight/insightface\",\n\"issueId\": 214,\n\"IssueComment\": \"{{IssueComment.value}}\"\n}" } ] }, { "name": "Close issue", ***@***.***": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"IssueClose\",\n\"repositoryFullName\": \"deepinsight/insightface\",\n\"issueId\": 214\n}" }, { "targets": [ { "os": "default", "uri": "#214 (comment)" } ], ***@***.***": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", ***@***.***": "HttpPOST", "target": "https://api.github.com", "body": "{\n\"commandName\": \"MuteNotification\",\n\"threadId\": 335514256\n}" } ], "themeColor": "26292E" }

nttstar · 2018-05-27T01:17:24Z

If the images were already aligned, why you resized them again in your code?

youyicloud · 2018-05-27T04:22:22Z

I have just croped the image by the boxes, I need to resize the image to the input shape. I hava found you code in deploy dir, I am analyzing my mistakes by comparing my code with your code, thank you a lot !

BUAA-21Li · 2018-06-01T06:55:25Z

# #The model I got is too big
i used the code:
CUDA_VISIBLE_DEVICES='0' python -u train_softmax.py --network y1 --ckpt 2 --loss-type 0 --lr-steps 120000,140000 --wd 0.00004 --fc7-wd-mult 10 --per-batch-size 512 --emb-size 128 --data-dir ../datasets/faces_ms1m_112x112 --prefix ../models/MobileFaceNet/model-y1-softmax
to got my model。but i found this model is almost 40M. i have no idea why i got so much big model comparing to yours? PLAESE HELP ME

AleximusOrloff · 2018-06-01T07:18:43Z

@BUAA-21Li
your model is too big cause of last fc layer, before softmax layer.

wayen820 · 2018-06-02T08:10:44Z

@BUAA-21Li use deploy/model_slim.py to delete last layer

Audi16 · 2018-06-06T03:16:32Z

Why have you pre-trained a model with softmax loss when training MobileFaceNet with Arcface loss, but training other networks from the scratch?

BUAA-21Li · 2018-06-06T03:34:49Z

@wayen820 THANKS ! I have solved it!

qidiso · 2018-06-10T14:10:35Z

**now we get more higher accuray using my modified mobilenet network

[lfw][12000]Accuracy-Flip: 0.99617+-0.00358
[agedb_30][12000]Accuracy-Flip: 0.96017+-0.00893 .

BUAA-21Li · 2018-06-16T12:39:41Z

@youyicloud is your problem solved? my code is similar to yours and the consine distances from samples are all around -0.99,no matter positive or negative samples.

youyicloud · 2018-06-16T16:12:08Z

@BUAA-21Li you can use deploy/test.py and load mobilefacenet model, then you can use the consine distance or the Euclidean Distance. It can output the right answer~

BUAA-21Li · 2018-06-17T07:39:25Z

@youyicloud thank you for your reply.Have you analyzed why your code failed getting correct result.

rmaria · 2018-06-28T07:32:53Z

In the article, you have reported results for LResNet100E-IR (for m=0.5):
LFW: 99.83 , CFP-FP: 94.04, AgeDB-30 98.08

With the Mobilenet (m=?) you report the accuracies:
LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91

What is the expected accuracy drop of this model on MegaFace Challenge 1 (Table 9 from the article)?

EdwardChou · 2018-09-05T08:02:43Z

Thanks for your code. Recently I was trying to reproduce the mobile facenet model by your instructions, yet I encountered some question as following, would you please give me some hints. (P.S. the training dataset was combined faces_ms1m_112x112 with my private dataset, using scripts like "im2rec.py", "face2rec2.py" and "dataset_merge.py".)

root@656688c713aa:/proj/insightface/src# CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir ../datasets/xl_marked --network y1 --loss-type 0 --prefix ../mobile_facenet --per-batch-size 128 --lr-steps "240000,360000,440000" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002
gpu num: 4
num_layers 1
image_size [112, 112]
num_classes 381
Called with argument: Namespace(batch_size=512, beta=1000.0, beta_freeze=0, beta_min=5.0, bn_mom=0.9, ckpt=2, ctx_num=4, cutoff=0, data_dir='../datasets/xl_marked', easy_margin=0, emb_size=128, end_epoch=100000, fc7_lr_mult=1.0, fc7_no_bias=False, fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_w=112, loss_type=0, lr=0.1, lr_steps='240000,360000,440000', margin=4, margin_a=1.0, margin_b=0.0, margin_m=0.1, margin_s=32.0, max_steps=140002, mom=0.9, network='y1', num_classes=381, num_layers=1, per_batch_size=128, power=1.0, prefix='../mobile_facenet', pretrained='', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', use_deformable=0, verbose=2000, version_act='prelu', version_input=1, version_output='E', version_se=0, version_unit=3, wd=4e-05)
init mobilefacenet 1
('version_output:', 'E')
Traceback (most recent call last):
File "train_softmax.py", line 488, in
main()
File "train_softmax.py", line 485, in main
train_net(args)
File "train_softmax.py", line 334, in train_net
sym, arg_params, aux_params = get_symbol(args, arg_params, aux_params)
File "train_softmax.py", line 170, in get_symbol
embedding = fmobilefacenet.get_symbol(args.emb_size, bn_mom = args.bn_mom, version_output=args.version_output)
File "symbols/fmobilefacenet.py", line 51, in get_symbol
assert version_output=='GDC' or version_output=='GNAP'
AssertionError

shangleyi · 2018-09-05T09:14:14Z

@EdwardChou add "--version-output GNAP" to argument

EdwardChou · 2018-09-06T02:57:30Z

@shangleyi Thanks for reply. After append "--version-output GNAP" to argument, run, and another error pop out, yet I am using the correct input size, namely 112*112 input images. This is pretty wired.

expected [3,160,160], got [3,112,112]

The complete log is as following:

root@656688c713aa:/proj/insightface/src# CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir ../datasets/marked_face_crop --network y1 --loss-type 0 --prefix ../mobile_facenet --per-batch-size 128 --lr-steps "240000,360000,440000" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002 --version-output GNAP
gpu num: 4
num_layers 1
image_size [112, 112]
num_classes 381
Called with argument: Namespace(batch_size=512, beta=1000.0, beta_freeze=0, beta_min=5.0, bn_mom=0.9, ckpt=2, ctx_num=4, cutoff=0, data_dir='../datasets/marked_face_crop', easy_margin=0, emb_size=128, end_epoch=100000, fc7_lr_mult=1.0, fc7_no_bias=False, fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_w=112, loss_type=0, lr=0.1, lr_steps='240000,360000,440000', margin=4, margin_a=1.0, margin_b=0.0, margin_m=0.1, margin_s=32.0, max_steps=140002, mom=0.9, network='y1', num_classes=381, num_layers=1, per_batch_size=128, power=1.0, prefix='../mobile_facenet', pretrained='', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', use_deformable=0, verbose=2000, version_act='prelu', version_input=1, version_output='GNAP', version_se=0, version_unit=3, wd=4e-05)
init mobilefacenet 1
('version_output:', 'GNAP')
INFO:root:loading recordio ../datasets/marked_face_crop/train.rec...
header0 label [  9369.  18696.]
id2range 9327
9368
rand_mirror 1
lr_steps [240000, 360000, 440000]
call reset()
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/mxnet/python/mxnet/io.py", line 396, in prefetch_func
    self.next_batch[i] = self.iters[i].next()
  File "/proj/insightface/src/image_iter.py", line 215, in next
    batch_data[i][:] = self.postprocess_data(datum)
  File "/mxnet/python/mxnet/ndarray/ndarray.py", line 437, in __setitem__
    self._set_nd_basic_indexing(key, value)
  File "/mxnet/python/mxnet/ndarray/ndarray.py", line 691, in _set_nd_basic_indexing
    value.copyto(self)
  File "/mxnet/python/mxnet/ndarray/ndarray.py", line 1876, in copyto
    return _internal._copyto(self, out=other)
  File "<string>", line 25, in _copyto
  File "/mxnet/python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/mxnet/python/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
MXNetError: [13:43:04] src/operator/nn/./../tensor/../elemwise_op_common.h:123: Check failed: assign(&dattr, (*vec)[i]) Incompatibleattr in node  at 0-th output: expected [3,160,160], got [3,112,112]

Stack trace returned 10 entries:
[bt] (0) /mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f5416c1559a]
[bt] (1) /mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f5416c16138]
[bt] (2) /mxnet/python/mxnet/../../lib/libmxnet.so(bool mxnet::op::ElemwiseAttr<nnvm::TShape, &mxnet::op::shape_is_none, &mxnet::op::shape_assign, true, &mxnet::op::shape_string[abi:cxx11], -1, -1>(nnvm::NodeAttrs const&, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, nnvm::TShape const&)::{lambda(std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, unsigned long, char const*)#1}::operator()(std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, unsigned long, char const*) const+0xbf1) [0x7f5416e6da61]
[bt] (3) /mxnet/python/mxnet/../../lib/libmxnet.so(bool mxnet::op::ElemwiseShape<1, 1>(nnvm::NodeAttrs const&, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*)+0x24a) [0x7f5416e6ff7a]
[bt] (4) /mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*)+0xb4d) [0x7f54191c0e1d]
[bt] (5) /mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x35f) [0x7f5419198d8f]
[bt] (6) /mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeImpl(void*, int, void**, int*, void***, int, char const**, charconst**)+0xe7b) [0x7f541968d4eb]
[bt] (7) /mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeEx+0x3ff) [0x7f541968ecaf]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5494337e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f54943378ab]



[13:43:06] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, thiscan take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
/mxnet/python/mxnet/module/base_module.py:466: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.25 vs. 0.001953125). Is this intended?
  optimizer_params=optimizer_params)
Killed

shangleyi · 2018-09-08T08:41:16Z

@EdwardChou How did you prepare train.rec

EdwardChou · 2018-09-10T07:19:11Z

Hi, @shangleyi
This is my way to generate train.rec.

cd PROJ_DIR/src/data

download im2rec.py, modify script follow #265

# 160*160*3 -> 112*112*3
python im2rec.py --list --resize 112 --recursive ./my_data IMG_DIR

echo "100,112,112" > property

Modify line to "with open('IMG_DIR' + fullpath, 'rb') as fin:"

python face2rec2.py  . 

# Move generated dataset to PROJ_DIR/datasets/MY_DATASET
python dataset_merge.py --include "../../datasets/faces_ms1m_112x112/,../../datasets/MY_DATASET/" --output "../../datasets/MY_MERGE_DATASET/"

shangleyi · 2018-09-10T08:33:19Z

@EdwardChou I used face2rec2.py directly without using im2rec.py and it worked. Maybe you should write a script which resizes the images and then use face2rec2.py directly. I'm not so sure about im2rec.py.

shangleyi · 2018-09-10T08:45:48Z

training dataset: ms1m, ms1m-v2. private dataset
lfw: 99.583, cfp_fp: 95.357, agedb_30: 96.533
training process: https://github.com/shangleyi/insightface-training-note/blob/master/README.md

EdwardChou · 2018-09-11T01:56:54Z

@shangleyi Thanks you so much. My problem is exactly the resize function in im2rec.py doesn't work. So I resize the images with another script. Currently the training procedure following instruction above looks good. You save my day!

sunjunlishi · 2018-09-21T08:58:39Z

Is there any training file corresponding to Caffe? I want to use Caffe training.
（有没有对应 caffe 的训练的文件，我想用caffe训练）

erichouyi · 2018-09-30T03:01:56Z

dataset: emore
network backbone: mobilefacenet + GNAP block
loss function: arcface(m=0.5)
training pipeline: finetune (lr drop at 100K, 140K, 160K), batch-size:512
one epoch 52: LFW-99.60%, CFP-FP-93.46%, AgeDB-95.45%

EdwardChou · 2018-10-10T10:35:08Z

Hi, @nttstar I encounter some strange thing when I finetune mobile-facenet model (2nd step of 2-step pipeline) and would like to ask for your help. My training acc got stuck in 0.51~0.53 while accuracy of lfw, agedb-30 reach 95%. Similar to #187

My finetune param is like:

Called with argument: Namespace(batch_size=512, beta=1000.0, beta_freeze=0, beta_min=5.0, bn_mom=0.9, ckpt=2,            ctx_num=4, cutoff=0, data_dir='../datasets/x', easy_margin=0, emb_size=128, end_epoch=100000, fc7_lr_mult=1.0,    fc7_no_bias=False, fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_w=112, loss_type=4, lr=0.1,         lr_steps='100000,140000,160000', margin=4, margin_a=1.0, margin_b=0.0, margin_m=0.5, margin_s=64.0, max_steps=0, mom=0.  9, network='y1', num_classes=94491, num_layers=1, per_batch_size=128, power=1.0, prefix='../xz/xz_mobile_facenet', pretrained='../xz_mobile_facenet,70',               rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', use_deformable=0, verbose=2000,          version_act='prelu', version_input=1, version_output='GNAP', version_se=0, version_unit=3, wd=4e-05)

and the result is like:

 INFO:root:Epoch[145] Batch [1780]   Speed: 851.07 samples/sec   acc=0.529687
 INFO:root:Epoch[145] Batch [1800]   Speed: 866.48 samples/sec   acc=0.529980
 INFO:root:Epoch[145] Batch [1820]   Speed: 725.38 samples/sec   acc=0.519043
 INFO:root:Epoch[145] Batch [1840]   Speed: 919.19 samples/sec   acc=0.527051
 INFO:root:Epoch[145] Batch [1860]   Speed: 996.87 samples/sec   acc=0.525586
 INFO:root:Epoch[145] Batch [1880]   Speed: 1021.45 samples/sec  acc=0.521094
 lr-batch-epoch: 0.0001 1894 145
 testing verification..
 (12000, 128)
 infer time 39.693939
 [lfw][1082000]XNorm: 11.132285
 [lfw][1082000]Accuracy-Flip: 0.99517+-0.00398
 testing verification..
 (14000, 128)
 infer time 42.053231
 [cfp_fp][1082000]XNorm: 9.771846
 [cfp_fp][1082000]Accuracy-Flip: 0.88900+-0.02205
 testing verification..
 (12000, 128)
 infer time 34.666512
 [agedb_30][1082000]XNorm: 11.260081
 [agedb_30][1082000]Accuracy-Flip: 0.95383+-0.00796
 saving 541

I have seen your training log attach in baiduyun, the log shows the acc of your model reach 0.5 after 15 epoch, which is the same to my experiment result. Yet the your log stop at 24 epoch when the highest acc reach 0.55. Did you conduct further experiment to reach higher accuracy? Or there is something wrong with the calculation of training acc? Looking for your help, Thanks.

jiankang1991 · 2018-11-06T11:55:15Z

Hi guys,
for the first step in the training pipeline, usually how many epochs do you use to get a reasonable accuracy of LFW, such as 99%?
I trained for a long time, the accuracy is always around 91%.

clhne · 2019-02-18T05:25:12Z

My 2-stage pipeline:

Train softmax with lr=0.1 for 120K iterations.

LRSTEPS='240000,360000,440000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

Switch to ArcFace loss to do normal training with '100K,140K,160K' iterations.

LRSTEPS='100000,140000,160000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 1 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --pretrained '../models2/model-y1-test/model,70'

Pretrained model: baiduyun
training dataset: ms1m
LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91

@nttstar
按照你的配置，请问训练了多长时间？达到 LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91这样的准确率。

clhne · 2019-02-18T05:30:53Z

Hi guys,
for the first step in the training pipeline, usually how many epochs do you use to get a reasonable accuracy of LFW, such as 99%?
I trained for a long time, the accuracy is always around 91%.

@karlTUM
CPU E5-2650, v4
GPU 2x RTX2080Ti
Epoch 15, batch_size 32, lr 0.001

INFO:root:Epoch[15] Batch [32040-32060] Speed: 274.12 samples/sec acc=0.865625
INFO:root:Epoch[15] Batch [32060-32080] Speed: 272.38 samples/sec acc=0.839063
INFO:root:Epoch[15] Batch [32080-32100] Speed: 272.94 samples/sec acc=0.855469
INFO:root:Epoch[15] Batch [32100-32120] Speed: 272.41 samples/sec acc=0.839063
INFO:root:Epoch[15] Batch [32120-32140] Speed: 272.01 samples/sec acc=0.852344
INFO:root:Epoch[15] Batch [32140-32160] Speed: 267.44 samples/sec acc=0.855469
INFO:root:Epoch[15] Batch [32160-32180] Speed: 273.78 samples/sec acc=0.853125
INFO:root:Epoch[15] Batch [32180-32200] Speed: 274.96 samples/sec acc=0.851562
INFO:root:Epoch[15] Batch [32200-32220] Speed: 273.08 samples/sec acc=0.842187
INFO:root:Epoch[15] Batch [32220-32240] Speed: 273.76 samples/sec acc=0.849219
lr-batch-epoch: 0.0001 32249 15
testing verification..
(12000, 512)
infer time 25.010638999999994
[lfw][924000]XNorm: 23.051082
[lfw][924000]Accuracy-Flip: 0.99700+-0.00296
testing verification..
(14000, 512)
infer time 29.09600100000001
[cfp_fp][924000]XNorm: 23.878208
[cfp_fp][924000]Accuracy-Flip: 0.92786+-0.01553
testing verification..
(12000, 512)
infer time 24.954134000000025
[agedb_30][924000]XNorm: 23.627240
[agedb_30][924000]Accuracy-Flip: 0.97650+-0.01031
saving 462

clhne · 2019-02-18T05:49:21Z

512

similar issue.
我这边epoch 17, acc已经达到了0.9，但后面提升就很慢了
请问：

CPU型号是？几颗？
显卡型号是？用了几张？

My 2-stage pipeline:

Train softmax with lr=0.1 for 120K iterations.

LRSTEPS='240000,360000,440000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

Switch to ArcFace loss to do normal training with '100K,140K,160K' iterations.

LRSTEPS='100000,140000,160000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 1 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --pretrained '../models2/model-y1-test/model,70'

Pretrained model: baiduyun
training dataset: ms1m
LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91

@nttstar
请问log文件是怎么自动生成的呢？谢谢~

Talgin · 2019-07-04T10:25:21Z

Hi, @shangleyi
This is my way to generate train.rec.

cd PROJ_DIR/src/data

download im2rec.py, modify script follow #265

# 160*160*3 -> 112*112*3
python im2rec.py --list --resize 112 --recursive ./my_data IMG_DIR

echo "100,112,112" > property

Modify line to "with open('IMG_DIR' + fullpath, 'rb') as fin:"

python face2rec2.py  . 

# Move generated dataset to PROJ_DIR/datasets/MY_DATASET
python dataset_merge.py --include "../../datasets/faces_ms1m_112x112/,../../datasets/MY_DATASET/" --output "../../datasets/MY_MERGE_DATASET/"

Hi, have you managed to get correct merged dataset?
We also tried to merge the two datasets: faces_emore and faces_glint with dataset_merge.py with the following code:
python dataset_merge.py --include /home/ti/Downloads/DATASETS/faces_emore,/home/ti/Downloads/DATASETS/faces_glint --output /home/ti/Downloads/DATASETS/merge --model /home/ti/Downloads/insightface/models/model-r100-ii/model,0
But after the merge completed the resulting dataset had the same property and .rec and .idx sizes as faces_emore dataset.
What is wrong with our parameters?

Thank you!

shangleyi · 2019-07-04T15:11:04Z

It has been a year and I can hardly remember what did I do, but did you try adding the quotation marks?

jinwu07 · 2019-07-04T17:36:29Z

Trained mobileFaceNet on emore, here is the result:

Called with argument: Namespace(batch_size=224, beta=1000.0, beta_freeze=0, beta_min=5.0, bn_mom=0.9, ce_loss=False, ckpt=1, color=0, ctx_num=1, cutoff=0, data_dir='../datasets/faces_emore', easy_margin=0, emb_size=128, end_epoch=100000, fc7_lr_mult=1.0, fc7_no_bias=False, fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_size='112,112', image_w=112, images_filter=0, loss_type=4, lr=0.1, lr_steps='200000,280000,320000', margin=4, margin_a=1.0, margin_b=0.0, margin_m=0.5, margin_s=64.0, max_steps=0, mom=0.9, network='y1', num_classes=85742, num_layers=1, per_batch_size=224, power=1.0, prefix='../models/y1-arcface-emore/model', pretrained='../models/y1-softmax-emore/model,234', rand_mirror=1, rescale_threshold=0, scale=0.9993, target='lfw,cfp_fp,agedb_30', use_deformable=0, verbose=2000, version_act='prelu', version_input=1, version_multiplier=1.0, version_output='E', version_se=0, version_unit=3, wd=4e-05)

testing verification..
(12000, 128)
infer time 5.607243
[lfw][346000]XNorm: 11.406996
[lfw][346000]Accuracy-Flip: 0.99600+-0.00442
testing verification..
(14000, 128)
infer time 6.47071
[cfp_fp][346000]XNorm: 9.418514
[cfp_fp][346000]Accuracy-Flip: 0.94729+-0.01445
testing verification..
(12000, 128)
infer time 5.542683
[agedb_30][346000]XNorm: 11.237676
[agedb_30][346000]Accuracy-Flip: 0.96300+-0.00942

capilano · 2019-07-10T09:32:09Z

What does accuracy_flip mean? Does it have to do with using features of flipped images during training?(as described in one of the mobileface papers?)
Or flipping during post processing while calculating embedding distance?

NOON47 · 2019-07-23T02:44:29Z

@nttstar你好，如何finetune自己的数据，能提供保护fc7的预训练模型吗？

…htface#214 lfw测试结果

EdwardVincentMa · 2019-11-13T10:00:50Z

My 2-stage pipeline:

Train softmax with lr=0.1 for 120K iterations.

LRSTEPS='240000,360000,440000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

Switch to ArcFace loss to do normal training with '100K,140K,160K' iterations.

LRSTEPS='100000,140000,160000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 1 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --pretrained '../models2/model-y1-test/model,70'

Pretrained model: baiduyun
training dataset: ms1m
LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91

your max-steps is 140002(140k), but you said 120k, lr-steps is 240000(240k), 360000(360k), ... , which is right?

yichaojin · 2020-05-18T03:37:39Z

dataset: emore
network backbone: mobilefacenet + GNAP block
loss function: arcface(m=0.5)
training pipeline: finetune (lr drop at 100K, 140K, 160K), batch-size:512
one epoch 52: LFW-99.60%, CFP-FP-93.46%, AgeDB-95.45%
@erichouyi

what's your acc on train data

bahar3474 · 2020-11-04T08:13:46Z

My 2-stage pipeline:

Train softmax with lr=0.1 for 120K iterations.

LRSTEPS='240000,360000,440000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 0 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 32.0 --margin-m 0.1 --ckpt 2 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --max-steps 140002

Switch to ArcFace loss to do normal training with '100K,140K,160K' iterations.

LRSTEPS='100000,140000,160000'
CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --data-dir $DATA_DIR --network "$NETWORK" --loss-type 4 --prefix "$PREFIX" --per-batch-size 128 --lr-steps "$LRSTEPS" --margin-s 64.0 --margin-m 0.5 --ckpt 1 --emb-size 128 --fc7-wd-mult 10.0 --wd 0.00004 --pretrained '../models2/model-y1-test/model,70'

Pretrained model: baiduyun
training dataset: ms1m
LFW: 99.50, CFP_FP: 88.94, AgeDB30: 95.91

Which version of ms1m did you use? I trained mobilefacenet with the ms1m-refine-v1 dataset and the same config (except that I used 2 GPUs with per_batch_size=256) but the maximum accuracy on LFW in 180K iterations was 0.99400.

CasonTsai · 2020-12-23T11:54:14Z

@bahar3474 hello,excuse me ,Where is the train_SOFTmax file? There is no such file in the branch of the new version

bahar3474 · 2020-12-23T13:33:19Z

@CasonTsai
Hi. I used this version of code:
https://github.com/deepinsight/insightface/blob/08265c749a7af6f1d7e9057df55a3eb2b171ddcb/src/train_softmax.py
Two months ago they refined the repo structure and I don't know where you can find it in new version.

nttstar added the Example label May 15, 2018

nttstar mentioned this issue May 17, 2018

What's the best accuracy of mobilefacenet on lfw, cfp_fp, agedb_30? #206

Closed

Lucien7786 mentioned this issue Aug 7, 2018

Arcface loss是不是比softmax loss更难收敛呢？ #313

Closed

KangKangLoveCat mentioned this issue Jan 30, 2019

models对应的图像格式是BGR还是RGB的顺序？ KangKangLoveCat/insightface_ncnn#1

Closed

KangKangLoveCat mentioned this issue Apr 16, 2019

Could you support mxnet model link corresponding to this ncnn mobilefacenet model KangKangLoveCat/insightface_ncnn#3

Closed

yippeesoft added a commit to yippeesoft/NotifyTools that referenced this issue Aug 19, 2019

PREF ai-faces insightface mobilefacenet模型库 Y1-MODEL deepinsight/insig…

6c6481d

…htface#214 lfw测试结果

olojuwin mentioned this issue Sep 27, 2019

可否分享训练代码 olojuwin/facerecognize-for-mobile-phone#4

Open

Qengineering mentioned this issue Jun 15, 2021

Fatal signal 11 (SIGSEGV), Qengineering/Face-Recognition-Jetson-Nano#7

Closed

nttstar closed this as completed Jun 1, 2023

xfqwdsj mentioned this issue Aug 5, 2023

身份ID：人脸识别 HSAS-H4o5F/APP#32

Open

MobileFaceNet training pipeline #214

MobileFaceNet training pipeline #214

Comments

nttstar commented May 15, 2018

nttstar commented May 16, 2018 • edited Loading

tianxingyzxq commented May 17, 2018

AllenMas commented May 17, 2018

AleximusOrloff commented May 23, 2018

youyicloud commented May 26, 2018

nttstar commented May 26, 2018

youyicloud commented May 26, 2018 via email

nttstar commented May 27, 2018

youyicloud commented May 27, 2018

BUAA-21Li commented Jun 1, 2018

AleximusOrloff commented Jun 1, 2018

wayen820 commented Jun 2, 2018

Audi16 commented Jun 6, 2018

BUAA-21Li commented Jun 6, 2018

qidiso commented Jun 10, 2018

BUAA-21Li commented Jun 16, 2018

youyicloud commented Jun 16, 2018

BUAA-21Li commented Jun 17, 2018

rmaria commented Jun 28, 2018

EdwardChou commented Sep 5, 2018

shangleyi commented Sep 5, 2018

EdwardChou commented Sep 6, 2018

shangleyi commented Sep 8, 2018

EdwardChou commented Sep 10, 2018

download im2rec.py, modify script follow #265

Modify line to "with open('IMG_DIR' + fullpath, 'rb') as fin:"

shangleyi commented Sep 10, 2018

shangleyi commented Sep 10, 2018

EdwardChou commented Sep 11, 2018

sunjunlishi commented Sep 21, 2018

erichouyi commented Sep 30, 2018

EdwardChou commented Oct 10, 2018 • edited Loading

jiankang1991 commented Nov 6, 2018

clhne commented Feb 18, 2019

clhne commented Feb 18, 2019

clhne commented Feb 18, 2019

Talgin commented Jul 4, 2019

download im2rec.py, modify script follow #265

Modify line to "with open('IMG_DIR' + fullpath, 'rb') as fin:"

shangleyi commented Jul 4, 2019

jinwu07 commented Jul 4, 2019

capilano commented Jul 10, 2019 • edited Loading

NOON47 commented Jul 23, 2019 • edited Loading

EdwardVincentMa commented Nov 13, 2019

yichaojin commented May 18, 2020 • edited Loading

bahar3474 commented Nov 4, 2020 • edited Loading

CasonTsai commented Dec 23, 2020

bahar3474 commented Dec 23, 2020

nttstar commented May 16, 2018 •

edited

Loading

EdwardChou commented Oct 10, 2018 •

edited

Loading

capilano commented Jul 10, 2019 •

edited

Loading

NOON47 commented Jul 23, 2019 •

edited

Loading

yichaojin commented May 18, 2020 •

edited

Loading

bahar3474 commented Nov 4, 2020 •

edited

Loading