Closed
Description
Currently mini-batch size N is subject to the memory limit. For example, for training a large model, I cannot use large mini-batch size, otherwise my GPU cannot N training sample at once.
Is it possible that Caffe can support mini-batch size that can be a multiple of input data batch size? My understanding is that it just needs to accumulate the gradients over several batches of input data before doing a model update step. Right?
I wonder if Caffe will support this functionality, or it already does that (I am new to Caffe so I may have missed something)? Or is there any difficulty I overlooked in implementing this functionality?