Training the BERT large extractive model

Hello,

Are the batch sizes and accum count for the bert large exactly the same as the base model? I have been trying to get the results but my bert large has been strictly performing worse than the base model( about 3-4 rouge points) and I have no idea why