Skip to content

Clarification on effective batch size in TabFlex training setup #23

@schnurrd

Description

@schnurrd

Hello,

thank you for the very interesting paper. I have a question regarding the training setup of TabFlex-S100, TabFlex-L100, and TabFlex-H1K.

In Appendix C.2 (Model Training), it is stated that the models were trained with batch sizes 1210, 110, and 1410 for 8, 4, and 4 epochs respectively. While experimenting with pre training, it seems that using such batch sizes would require significantly more GPU memory than the 80 GB A100 reported in the paper.

Am I missing something, or does the reported batch size correspond to the effective batch size, including gradient accumulation (batch_size × aggregate_k_gradients)? If so, I would be very interested in the concrete values used for batch_size and aggregate_k_gradients and the reasons for the overall very high batch_size value.

Thank you very much in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions