[Benchmark] HF Trainer on A100

# 🖥 Benchmarking `transformers` w/ HF Trainer on a single A100 40GB

We are going to use a special benchmarking tool that will do all the work for us. https://github.com/huggingface/transformers/pull/14934

This is the index post and specific benchmarks are in their own posts below:

1. [fp16 vs bf16 vs tf32 vs fp32](https://github.com/huggingface/transformers/issues/15026#issuecomment-1004543189)
2. [gradient accumulation steps](https://github.com/huggingface/transformers/issues/15026#issuecomment-1004592231)
3. [batch size](https://github.com/huggingface/transformers/issues/15026#issuecomment-1005033957)
4. [gradient checkpointing](https://github.com/huggingface/transformers/issues/15026#issuecomment-1005034578)
5. [optimizers](https://github.com/huggingface/transformers/issues/15026#issuecomment-1005220263)
6. [combining winning strategies](https://github.com/huggingface/transformers/issues/15026#issuecomment-1005227577) **~3x speed improvement!**
7. [RTX-3090 vs A100](https://github.com/huggingface/transformers/issues/15026#issuecomment-1005235845)

Note that each benchmark was run only once, so multiple runs and averaging is probably going to give slightly different results. The purpose here though is to see relative differences roughly and not try to give an exact number.

See also the [same benchmarks for RTX-3090](https://github.com/huggingface/transformers/issues/14608)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] HF Trainer on A100 #15026

🖥 Benchmarking `transformers` w/ HF Trainer on a single A100 40GB

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Benchmark] HF Trainer on A100 #15026

Description

🖥 Benchmarking transformers w/ HF Trainer on a single A100 40GB

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

🖥 Benchmarking `transformers` w/ HF Trainer on a single A100 40GB