SGDFusion function issue #256
Description
Hi i'm intel caffe user.
I think, i found the wrong flow of SGDFusion function (/sgd_solver.cpp).
When using GCC compiler or not using "iter_size", it doesn't make any problem. But, when using intel compiler and using "iter_size", LARS makes some problem.
As i know, when using intel compiler, SGD_FUSION option turns on.
In "SGD_FUSION" flow, it is executed in the order of "GetLocalRate(it includes LARS)", "normalize" , "regularization & update".
In this time, "normalize" divide "diff_data(mutable_cpu_diff or mutable_prv_diff)" by "iter_size". But, "LARS" is effected by sumsq_diff and sumsq_data.
So,i think "GetLocalRate" should be executed after "normalize".
After changing the SGD_FUSION flow("normalize" -> "GetLocalRate" -> "regularization & update"), LARS works fine.
Would you check the SGD_FUSION?