Bert-large model not attaining ~65% accuracy even after training till 52k timesteps!

We are using p100 and 25 gb ram to train the bert large model. 
But when we tried to run the default code with bs=6 and num_batch_accumulated=4, we got cuda out of memory error.
Thus we changed it to bs=2 and num_batch_accumulated=8 as you said anything between 16...24 would perform similarly.
But now after training till 52000 timesteps, the maximum accuracy we got is ~59.6% at 44000th timestep.
Is it taking more time because we changed the batch_size? Or is there anything else we are missing out? 

**RESULT at 48000 and 52000 timestep:**

Loading model from logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/model_checkpoint-00048000
DB connections: 100% 166/166 [02:31<00:00,  1.10it/s]
100% 1034/1034 [05:45<00:00,  2.99it/s]
DB connections: 100% 166/166 [00:00<00:00, 448.81it/s]
Wrote eval results to logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/ie_dirs/bert_run_true_1-step48000.eval
48000 0.5638297872340425

Loading model from logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/model_checkpoint-00052000
DB connections: 100% 166/166 [00:00<00:00, 443.91it/s]
100% 1034/1034 [05:31<00:00,  3.12it/s]
DB connections: 100% 166/166 [00:00<00:00, 467.06it/s]
Wrote eval results to logdir/bert_run/bs=2,lr=7.4e-04,bert_lr=3.0e-06,end_lr=0e0,att=1/ie_dirs/bert_run_true_1-step52000.eval
52000 0.586073500967118

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bert-large model not attaining ~65% accuracy even after training till 52k timesteps! #10

karthikj11
openedon Jul 29, 2020

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bert-large model not attaining ~65% accuracy even after training till 52k timesteps! #10

Description

karthikj11openedon Jul 29, 2020

Metadata