We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
2 parents f006b7a + f25f6ef commit 72172d6Copy full SHA for 72172d6
csrc/flash_api.cpp
@@ -255,6 +255,14 @@ std::tuple<at::Tensor, at::Tensor> set_params_splitkv(
255
TORCH_CHECK(params.num_splits <= 128, "num_splits > 128 not supported");
256
}
257
258
+ // Temporarily disable Split-KV, because some bugs are still being fixed.
259
+ // See: https://github.com/SmallDoges/flash-dmattn/issues/47
260
+ // Regardless of how it is set externally, always set num_splits back to 1.
261
+ // This is to avoid the extra memory overhead of Split-KV.
262
+ params.num_splits = 1;
263
+ softmax_lse_accum.reset();
264
+ out_accum.reset();
265
+
266
return std::make_tuple(softmax_lse_accum, out_accum);
267
268
0 commit comments