Skip to content

[Benchmark] HF Trainer on A100 #15026

Open
@stas00

Description

@stas00

🖥 Benchmarking transformers w/ HF Trainer on a single A100 40GB

We are going to use a special benchmarking tool that will do all the work for us. #14934

This is the index post and specific benchmarks are in their own posts below:

  1. fp16 vs bf16 vs tf32 vs fp32
  2. gradient accumulation steps
  3. batch size
  4. gradient checkpointing
  5. optimizers
  6. combining winning strategies ~3x speed improvement!
  7. RTX-3090 vs A100

Note that each benchmark was run only once, so multiple runs and averaging is probably going to give slightly different results. The purpose here though is to see relative differences roughly and not try to give an exact number.

See also the same benchmarks for RTX-3090

Metadata

Metadata

Assignees

Labels

BenchmarksIssues related to Memory regressions in tests and scriptsWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions