1 In author raw Lua code work ,I couldn't explictly find there is a batchnormlization.Maybe I dont konw how to use Lua and Torch
Because in the procedure of optimization,there is always a gradient explode and thus cant continue to training,I dont kown why .I provide two solutions:First,use gradient clip in the train code which i comment it as i have used BN,once use BN,there is no optimization problem.Second,use BN