Skip naive conv testing to speed up #3383

RobQuistNL · 2024-11-14T20:25:10Z

Hi,

Looking at running various models with various inputs - it seems a lot of time for the initial runs is being spent benchmarking potential kernels - including the naive ones (e.g. naive_conv_nonpacked_fwd_nchw_float_double_float)

The solution that comes up usually is not the naive one, but one of the other kernels. Running with MIOPEN_DEBUG_CONV_DIRECT=0 significantly speeds up initial runs of said model with varying resolutions.

Would it be an option to get this testing / benching dynamically, without excluding it completely? Where the naive kernel would be the least preferred - and if another is found it would be a safe bet to say the other implementation is faster (so the testing of the kernel itself could be skipped alltogether)

If its not desired behaviour - maybe this could be added behind a feature flag.

I'm quite sure that people running this without knowing about it, would experience major speedups in initial runs (the test case here is various VAE models being ran).

The text was updated successfully, but these errors were encountered:

RobQuistNL · 2024-11-14T20:30:16Z

Here's a snippet from the ufdb in question - I'm not 100% sure but I think this shows that some of those ConvDirectNaive kernels take a lot of time;

Click to view `HIP.3_2_0.ufdb.txt`

HIP.3_2_0.ufdb.txt

1920-26-32-1x1-1280-26-32-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.25571,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.736573,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.289919,15564800,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.116864,19169280,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.266399,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:16.9739,0,miopenConvolutionFwdAlgoDirect
1280-52-64-3x3-1280-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:3.34123,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.71735,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.64133,63569920,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.03775,97648640,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.92911,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:240.954,0,miopenConvolutionFwdAlgoDirect
1920-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.55705,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:5.0626,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.24214,56197120,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.34646,73236480,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.36562,115015680,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:200.26,0,miopenConvolutionFwdAlgoDirect
1920-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.49522,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:1.22867,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.406174,36536320,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.242687,51118080,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.269183,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:35.5401,0,miopenConvolutionFwdAlgoDirect
1280-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.7151,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.38824,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.42153,40304640,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.02272,57344000,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.19766,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:129.973,0,miopenConvolutionFwdAlgoDirect
1280-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.67263,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.831037,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.327743,27197440,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.176575,42598400,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.192191,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:23.6178,0,miopenConvolutionFwdAlgoDirect
960-52-64-3x3-640-52-64-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.29372,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:2.53945,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.967996,32358400,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.697597,49397760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:1.53158,57507840,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:96.8891,0,miopenConvolutionFwdAlgoDirect
960-52-64-1x1-640-52-64-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.26147,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.632862,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.283551,22528000,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.135807,38338560,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.219487,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:17.7857,0,miopenConvolutionFwdAlgoDirect
640-104-128-3x3-640-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:3.39595,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.30868,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.36338,75530240,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.94018,143687680,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.62095,153354240,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:279.665,0,miopenConvolutionFwdAlgoDirect
960-104-128-3x3-320-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.58604,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:4.89873,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.72639,73687040,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:2.27314,107765760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:4.93534,230031360,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:256.787,0,miopenConvolutionFwdAlgoDirect
960-104-128-1x1-320-104-128-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:2.52303,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:1.25126,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.471806,68771840,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.334623,102236160,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.346175,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:43.6982,0,miopenConvolutionFwdAlgoDirect
640-104-128-3x3-320-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.74319,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.34469,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:1.4166,54804480,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.47539,88883200,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:1.90972,153354240,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:165.897,0,miopenConvolutionFwdAlgoDirect
640-104-128-1x1-320-104-128-2-0x0-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:1.70063,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.853245,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.39635,51527680,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.251679,85196800,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.289215,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:28.6757,0,miopenConvolutionFwdAlgoDirect
320-104-128-3x3-4-104-128-2-1x1-1x1-1x1-0-NCHW-FP16-F=ConvBinWinogradRxSf2x3g1:0.0983036,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.173375,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.315071,17275392,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.211327,17701376,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:0.734749,76677120,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:40.4575,0,miopenConvolutionFwdAlgoDirect
4-104-128-1x1-4-104-128-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:0.0151358,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.0191038,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.142335,426240,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.0238398,425984,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.0715196,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:0.149664,0,miopenConvolutionFwdAlgoDirect
4-104-128-3x3-512-104-128-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:0.0611198,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.0808954,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.287711,27549696,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.102943,27549696,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:0.111072,1916928,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:0.749117,0,miopenConvolutionFwdAlgoDirect
512-104-128-3x3-512-104-128-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:1.84121,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:3.48776,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:2.2487,63963136,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:1.93567,63963136,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:2.23257,245366784,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:101.823,0,miopenConvolutionFwdAlgoDirect
512-104-128-1x1-512-104-128-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:1.79679,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:0.900925,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:0.531326,55574528,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:0.365215,54525952,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:0.273055,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:18.3887,0,miopenConvolutionFwdAlgoDirect
512-208-256-3x3-512-208-256-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:7.17329,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:12.9223,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:8.36243,227540992,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:7.62704,227540992,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:8.17904,981467136,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:477.672,0,miopenConvolutionFwdAlgoDirect
512-416-512-3x3-512-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:28.9988,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:49.5818,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:33.8475,881852416,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:32.3395,881852416,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:41.5719,3925868544,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:2055.18,0,miopenConvolutionFwdAlgoDirect
512-416-512-3x3-256-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:14.8489,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:25.3045,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:17.5976,659030016,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:16.8068,659030016,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:21.3744,3925868544,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:1877.97,0,miopenConvolutionFwdAlgoDirect
256-416-512-3x3-256-416-512-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:7.64682,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:12.8598,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:10.3321,438566912,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:8.93561,438566912,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:10.1123,1962934272,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:873.538,0,miopenConvolutionFwdAlgoDirect
512-416-512-1x1-256-416-512-1-0x0-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:14.3863,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:6.59623,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:5.1019,654835712,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:4.32277,654311424,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwd1x1_0_1:1.92758,0,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:286.65,0,miopenConvolutionFwdAlgoDirect
256-832-1024-3x3-256-832-1024-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:32.2882,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:50.3876,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:44.9249,1747189760,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:39.9932,1747189760,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:52.2714,7851737088,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:3816.86,0,miopenConvolutionFwdAlgoDirect
256-832-1024-3x3-128-832-1024-1-1x1-1x1-1x1-0-NCHW-FP32-F=ConvBinWinogradRxSf2x3g1:16.3588,0,miopenConvolutionFwdAlgoWinograd;ConvBinWinogradRxSf3x2:25.5977,0,miopenConvolutionFwdAlgoWinograd;ConvHipImplicitGemmGroupFwdXdlops:22.7511,1309802496,miopenConvolutionFwdAlgoImplicitGEMM;ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC:20.9185,1309802496,miopenConvolutionFwdAlgoImplicitGEMM;GemmFwdRest:27.0331,7851737088,miopenConvolutionFwdAlgoGEMM;ConvDirectNaiveConvFwd:3346.55,0,miopenConvolutionFwdAlgoDirect

ppanchad-amd · 2024-11-15T15:03:48Z

Hi @RobQuistNL. Internal ticket has been created to assist with your issue. Thanks!

huanrwan-amd · 2024-11-18T17:10:01Z

Hi @RobQuistNL, can you please provide more info on your hardware and software version (ROCm version and OS version)? Thanks.

RobQuistNL · 2024-11-18T21:33:32Z

hey @huanrwan-amd ;

rocm/pytorch:rocm6.2.3_ubuntu22.04_py3.10_pytorch_release_2.3.0 as the base image, with pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2 and;

git clone --recursive https://github.com/ROCm/flash-attention /tmp/flash-attention
cd /tmp/flash-attention; export GPU_ARCHS="gfx90a"; pip3 install .

huanrwan-amd · 2024-11-18T21:35:58Z

Hi @RobQuistNL, thanks for the info. This issue is more like a feature enhancement. I will contact internal team first.

ppanchad-amd added the Under Investigation label Nov 15, 2024

huanrwan-amd added the enhancement label Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip naive conv testing to speed up #3383

Skip naive conv testing to speed up #3383

RobQuistNL commented Nov 14, 2024 •

edited

Loading

RobQuistNL commented Nov 14, 2024 •

edited

Loading

ppanchad-amd commented Nov 15, 2024

huanrwan-amd commented Nov 18, 2024

RobQuistNL commented Nov 18, 2024

huanrwan-amd commented Nov 18, 2024

Skip naive conv testing to speed up #3383

Skip naive conv testing to speed up #3383

Comments

RobQuistNL commented Nov 14, 2024 • edited Loading

RobQuistNL commented Nov 14, 2024 • edited Loading

ppanchad-amd commented Nov 15, 2024

huanrwan-amd commented Nov 18, 2024

RobQuistNL commented Nov 18, 2024

huanrwan-amd commented Nov 18, 2024

RobQuistNL commented Nov 14, 2024 •

edited

Loading

RobQuistNL commented Nov 14, 2024 •

edited

Loading