Skip to content

Comments

fused_mha default to false for nets with relu^2 activation#2378

Open
borg323 wants to merge 1 commit intoLeelaChessZero:masterfrom
borg323:cutlass_relu_2
Open

fused_mha default to false for nets with relu^2 activation#2378
borg323 wants to merge 1 commit intoLeelaChessZero:masterfrom
borg323:cutlass_relu_2

Conversation

@borg323
Copy link
Member

@borg323 borg323 commented Jan 28, 2026

No description provided.

@Menkib64
Copy link
Contributor

I suspect that other networks will have NaN problems too. I suspect that all networks are fp16 unsafe because they were trained to use fp32.
I see a few potential solution. We either need to fix old networks to be fp16 safe (I don't know how feasible this is), we can add guards against infinites in softmax kernels (like you did for cuda), or we could add quantization scalers which would adjust values to avoid overflows.

@borg323
Copy link
Member Author

borg323 commented Jan 31, 2026

However the cutlass code was robust so far. It may be that updating the code to use the latest upstream version is to blame. Even so, ReLU^2 is a likely source of overflows in fp16, so avoiding unsafe code still makes sense to me.

@Menkib64
Copy link
Contributor

It makes sense to disable it for a network which have clear problems. I'm worried that BT4 might have a rare NaN problem. I was also thinking how to make a generic fix for overflows. I raised it here in case you might come up with a simple generic solution. I think this should be merged, if none comes up with a simple solution soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants