Tags: VectorInstitute/vectorlm
Tags
Add revised benchmarking logic and results (#9) * Revised estimation of batch count, directly retrieving from len(train_dataloader). Deleted unused timer_handle argument in Trainer. Revised handling of "max_seq_len" override in benchmarking. Added support for automatic switching between lora and full-rank sharding scheme in benchmarking. * Revised handling of unspecified max_seq_length. Added llama-3 to benchmark model_list. * Benchmarking: Revised benchmark script to ensure consistent per-device train batch size. * Benchmarking: replaced trainer.step with trainer.train_step to avoid eval overhead in benchmarking. Revised benchmark parsing logic; display optimal batch size for each context width value. * Benchmarking: Updated reference throughput based on updated logic. * Benchmarking: Updated reference throughput descriptions.