The default policy of using aggregation='mean' is incorrect and should be set to none. In distributed contexts, backends handle gradient reductions and variable updates, making the mean aggregation unnecessary. Using aggregation='mean' disrupts optimizer moment estimates. For reference, in keras==2.15.0, the default policy was effectively equivalent to aggregation='none