Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown and Higher Memory Consumption for GPTQ-LoRA with Bfloat16 #84

Open
achew010 opened this issue Sep 12, 2024 · 1 comment
Open
Labels
question Further information is requested

Comments

@achew010
Copy link
Contributor

achew010 commented Sep 12, 2024

Description

Regression Test for Loss, Memory,

Throughput
Comparisons on loss, memory and throughput for Full-FT, PEFT

  • QLoRA: status quo on the switch of torch_dtype=float16 (Reference) to torch_dtype=bfloat16 (New).
  • GPTQ-LoRA: impact in terms of increase in memory consumption and decrease in throughput with

See Outliers

Subset of Outliers processed into this table

A = pd.read_csv('outliers.1.csv', index_col=None)

As = []

for tag,G  in A.groupby('scenario'):
    reg = G.reference < G.new # those that got higher (worse)
    if tag =='train_tokens_per_second':
        reg = reg.apply(lambda x: not x) # these are those that are worse if lower

    As.append(G.loc[reg])

A = pd.concat(As)

image

@fabianlim
Copy link
Contributor

fabianlim commented Sep 12, 2024

@achew010 are we positive the slow down only affects GPTQ-LoRA and nothing else (e.g., full, regular peft). I remember you used to print out a table, can we check it also

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants