Skip to content

Commit

Permalink
Fix multi-GPU training.
Browse files Browse the repository at this point in the history
A previous fix to let validation run across more
than one batch caused an issue with multi-GPU
training. The issue seems to be in how Keras
averages loss and metric values, where it expects
them to be scalars rather than arrays. This fix
causes scalar outputs from a model to remain
scalar in multi-GPU training.
  • Loading branch information
waleedka committed Apr 21, 2018
1 parent 2a7bcfc commit 9cea282
Showing 1 changed file with 12 additions and 10 deletions.
22 changes: 12 additions & 10 deletions mrcnn/parallel_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,18 @@ def make_parallel(self):
with tf.device('/cpu:0'):
merged = []
for outputs, name in zip(outputs_all, output_names):
# If outputs are numbers without dimensions, add a batch dim.
def add_dim(tensor):
"""Add a dimension to tensors that don't have any."""
if K.int_shape(tensor) == ():
return KL.Lambda(lambda t: K.reshape(t, [1, 1]))(tensor)
return tensor
outputs = list(map(add_dim, outputs))

# Concatenate
merged.append(KL.Concatenate(axis=0, name=name)(outputs))
# Concatenate or average outputs?
# Outputs usually have a batch dimension and we concatenate
# across it. If they don't, then the output is likely a loss
# or a metric value that gets averaged across the batch.
# Keras expects losses and metrics to be scalars.
if K.int_shape(outputs[0]) == ():
# Average
m = KL.Lambda(lambda o: tf.add_n(o) / len(outputs), name=name)(outputs)
else:
# Concatenate
m = KL.Concatenate(axis=0, name=name)(outputs)
merged.append(m)
return merged


Expand Down

1 comment on commit 9cea282

@JonathanCMitchell
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is still broken here. Unable to train on the latest keras version 2.1.6

Please sign in to comment.