Skip to content

Conversation

@BruceDai
Copy link
Contributor

@BruceDai BruceDai commented Jun 2, 2022

This PR is to add some conformance tests (Float32) for WebNN API operations of first-wave models.

  1. Add tests for WebNN API concat / reshape / slice / split / squeeze / transpose operations which just manipulate input tensors

  2. Add tests for WebNN API clamp / relu operations which just simply compute input tensors

Above 8 operations have 0 ULP (unit of least precision) distance between actual data with expected data on different HW devices.

@BruceDai
Copy link
Contributor Author

BruceDai commented Sep 7, 2022

@dontcallmedom @Honry I've updated this PR, please take another look, thanks.

// https://webmachinelearning.github.io/webnn/#enumdef-mldevicetype
const DeviceTypeArray = ['cpu', 'gpu'];

// unit of least precision (ULP) is the spacing between two consecutive floating-point numbers.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchao1115 Would you please help review these ULP tolerance, thanks.

-1.5487962, 0.1265688, 0.7930071, 0.63802403, 0.3400246,
];
// expected data by clamping input data within a range [-1, 1]
const expected = [
Copy link
Contributor Author

@BruceDai BruceDai Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fdwr , based on last discussion in WebML CG meeting, here input value of Number type (float64) will be casted to float32 before computing, see below line#75

const inputs = {'x': new Float32Array(inputValue)};

while each element of this expected Array is also of Number type (float64), we should use expected data of float32 for float32 test case, so it needs cast output value from float64 to float32, on this purpose, I need update line#87

assert_array_approx_equals_ulp(outputs.y, expected, ULPTolerance.float32.clamp, 'float32');

to

assert_array_approx_equals_ulp(outputs.y, new Float32Array(expected), ULPTolerance.float32.clamp, 'float32');

Copy link

@fdwr fdwr Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's important for the inputs to be the same data type, so both reference baseline engine and GPU execution provider implementations start out from the same input value. Otherwise additional error would immediately be introduced that isn't fair to the GPU implementation because precision was already lost before the GPU even saw the input.

while each element of this expected Array is also of Number type (float64), we should use expected data of float32 for float32 test case, so it needs cast output value from float64 to float32

Yes, for output, we can compare the GPU's float32 output to the reference engine's downcasted (float64 -> float32) output. Note, it should also be okay to compare the upcasted GPU output (float32 -> float64) with the reference engine, so long as you shift the ULP values accordingly (float64 has 52 fraction bits, while float32 has 23 fraction bits, yielding a shift of <<29). One advantage to the latter approach is that you could see more precise final output values, without losing any additional precision from rounding down the baseline output float64->float32. However, the former is simpler and close enough, and it's what I've always done.

Note the "expected" output should be regenerated from the downcasted float32 inputs, not the original float64 input values.

Do you know if the float64 to Float32Array cast rounds to nearest even or toward zero? I couldn't tell in my quick search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the above solution of expected data for test case seems not correct, from beginning we need pass the float32 input to webnn-baseline then get/generate output data of float32, then use those input/output (float32) for float32 WPT test cases, would be this right? Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 How are you generating the arrays of expected output data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you generating the arrays of expected output data?

We've already implemented some operations functions of webnn-baseline repo by pure JavaScript without depending on 3rd party libs. We could (1) generate and save input data by invoking Math.random() in JavaScript, (2) run operation of webnn-baseline with feeding these input data, then (3) get and save output data as expected data. These input value and output value are of float64, for float32 purpose, we need cast input data to float32 in step (2), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huningxin @wchao1115 Would you please take a look about float32 baseline? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fdwr

Do you know if the float64 to Float32Array cast rounds to nearest even or toward zero? I couldn't tell in my quick search.

It should cast rounds to nearest even. As the NumericToRawBytes for Float32 type in ECMA-262, it says "the result of converting value to IEEE 754-2019 binary32 format using roundTiesToEven mode".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceDai

Would you please take a look about float32 baseline? Thanks.

As @fdwr suggested, we probably need to convert the inputs to float32 before feed to webnn-baselin, e.g., by using Float32Array. The webnn-baseline would still use double-precision calculation and generate double-precision outputs. Then outputs should be convert to float32 again as expected results.

          concat / reshape / slice / split / squeeze / transpose
// 5D
[[2, 3, 2, 1, 2], expectedMinMax, {minValue: -5, maxValue: 7}],
[[2, 3, 2, 1, 2], expectedRelu6, {minValue: 0, maxValue: 6}],
],
Copy link

@fdwr fdwr Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have at least one case where minValue > 0 and one where maxValue < 0?

Copy link
Contributor Author

@BruceDai BruceDai Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean adding exception tests? It's expected that an exception error should be threw for this minValue > 0 and maxValue < 0 case.

Copy link

@fdwr fdwr Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

[[2, 3, 2, 1, 2], expectedMinMaxBothPositive, {minValue: 5, maxValue: 10}],
[[2, 3, 2, 1, 2], expectedMinMaxBothNegative, {minValue: -10, maxValue: -5}],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated clamp tests by adding such tests, please take a look again, thanks.

0, 0, 6, 0,
],
};
// expected data by clamping input data with specified minValu=-5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo minValue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed typo with updated commit, thanks.

-8.275078596251655, -1.3557080837143296, 7.348269585030259, -5.530012756488021,
];
// expected data by clamping input data with default options
const expectedDefault = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share how to get those expected data by running https://github.com/webmachinelearning/webnn-baseline against the inputData? And probably share the script that others can reproduce the expected results.

Copy link

@fdwr fdwr Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also important (even more so) that the input be typed than the output. e.g.

const inputData = [
    float32: [
        -3.4356448650360107421875, -6.53094577789306640625,   -8.1757602691650390625,    2.08796405792236328125, 
        -4.4801502227783203125,    -8.59150409698486328125,    5.071455478668212890625, -6.618697643280029296875, 
         4.22457790374755859375,    6.45027256011962890625,   -8.79992389678955078125,  -3.3445966243743896484375, 
         5.550524234771728515625,   1.2788677215576171875,     9.33370304107666015625,   9.2261638641357421875, 
        -7.30272006988525390625,    1.78659021854400634765625, 5.5649814605712890625,    3.145101070404052734375, 
        -8.27507877349853515625,   -1.35570812225341796875,    7.34826946258544921875,  -5.530012607574462890625, 
    ]
    float64: [
        -3.4356449350874696, -6.530945988411405, -8.175760663838268,  2.0879641317522726,
        -4.480150236948526,  -8.591504561715722,  5.071455429211573, -6.618697702258771,
         4.224577823136105,   6.450272349350044, -8.799923845835664, -3.3445965406946643,
         5.550524270215341,   1.2788677438688012, 9.333702625514768,  9.2261637863086,
        -7.302720212371034,   1.7865902395032585, 5.564981581526375,  3.145101011211482,
        -8.275078596251655,  -1.3557080837143296, 7.348269585030259, -5.530012756488021,
    ]
];

That way both reference engine baseline and browser implementation start from the same value. It's totally fine for the reference engine to immediately upcast the float32 input to float64, then continue all computations in float64, but we just want to avoid any downcoast from float64 to float32 before providing input to the browser implementation.

];
// expected data by clamping input data with default options
const expectedDefault = {
float32: [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share how you convert the double precision results of webnn-baseline to float32?

await context.computeAsync(graph, inputs, outputs);
}

const expectedData = expected instanceof Array ? new TestTypedArray(expected) : expected[operandType];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use TestTypedArray(expected) if the expected results are double precision results generated by webnn-baseline? At line 14, the input is new TestTypedArray(inputValue). I suppose they should be aligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @huningxin, current webnn-baseline could generate float64 baseline (expected data) by feeding float64 input data which @wchao1115 and CG also agreed on, while @fdwr last commented that it's unfair to simply casting float64 baseline to be float32 baseline since using actual casted float32 input data would make precision loss from original float64 input data.
So I workaround webnn-baseline to support generate float32 baseline, please review this generation code here, thanks.

Also ping @wchao1115 and @fdwr to have a review this baseline generation code, thanks.

Copy link

@fdwr fdwr Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it matters that the JS baseline and the C++ implementation start with the same value. Otherwise, if the JS implementation receives the more accurate 3.14159265 input, but C++ implementation sees an already truncated value of 3.14159274, then data loss has already occurred outside its control, and we'd be measuring the C++ implementation against an unfair standard (which would be like telling a function to reverse the characters in a string "CAT" <-> "TAC" but chopping off the last character "CA", yet still expecting it to return "TAC" instead of "AC").

For float16 (which is probably a later target), you probably need some casting helper function, since I don't see a Float16Array in ES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For float16 (which is probably a later target), you probably need some casting helper function, since I don't see a Float16Array in ES.

@wchao1115 commented previously on webmachinelearning/webnn#127 :

"Although the lack of FP16 support in the ArrayBufferView could possibly be worked around by just using Uint16Array to appropriately offset the tensor data and deal with the float-casting inside the webnn implementation itself, it could still be confusing to some at the model builder level."

There is a workaround by 森建 on https://esdiscuss.org/topic/float16array :

"WebGL 2.0 supports half-precision float (16bit float) by default.
But now, we must use the following dirty hack using Uint16Array."

// ref: http://stackoverflow.com/questions/32633585/how-do-you-convert-to-half-floats-in-javascript
var toHalf = (function() {

   var floatView = new Float32Array(1);
   var int32View = new Int32Array(floatView.buffer);

   /* This method is faster than the OpenEXR implementation (very often
    * used, eg. in Ogre), with the additional benefit of rounding, inspired
    * by James Tursa?s half-precision code. */
   return function toHalf(val) {

     floatView[0] = val;
     var x = int32View[0];

     var bits = (x >> 16) & 0x8000; /* Get the sign */
     var m = (x >> 12) & 0x07ff; /* Keep one extra bit for rounding */
     var e = (x >> 23) & 0xff; /* Using int is faster here */

     /* If zero, or denormal, or exponent underflows too much for a denormal
      * half, return signed zero. */
     if (e < 103) {
       return bits;
     }

     /* If NaN, return NaN. If Inf or exponent overflow, return Inf. */
     if (e > 142) {
       bits |= 0x7c00;
       /* If exponent was 0xff and one mantissa bit was set, it means NaN,
        * not Inf, so make sure we set one mantissa bit too. */
       bits |= ((e == 255) ? 0 : 1) && (x & 0x007fffff);
       return bits;
     }

     /* If exponent underflows but not too much, return a denormal */
     if (e < 113) {
       m |= 0x0800;
       /* Extra rounding may overflow and set mantissa to 0 and exponent
        * to 1, which is OK. */
       bits |= (m >> (114 - e)) + ((m >> (113 - e)) & 1);
       return bits;
     }

     bits |= ((e - 112) << 10) | (m >> 1);
     /* Extra rounding. An overflow will set mantissa to 0 and increment
      * the exponent, which is OK. */
     bits += m & 1;
     return bits;
   };

}());

var tex = new Uint16Array(4);
tex[0] = toHalf(0.5);
tex[1] = toHalf(1);
tex[2] = toHalf(123);
tex[3] = toHalf(-13);

* @param {Boolean} [options.bTranspose]
* @returns {Number}
*/
function gemmULPTolerances(shapeA, options) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fdwr Please take a review of this implementation of gemm ULP tolerances, thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment somewhere would be helpful. e.g.

// GEMM : alpha * (A x B) + beta * C
// An upper bound for the worst serial ordering is bounded by
// the number of lossy operations, where matrix multiplication
// is a dot product (mul and add times the number of elements)
// plus bias operations.

gemmOptions[key] = options[key];
}

const width = gemmOptions.aTranspose ? shapeA[0] : shapeA[1];
Copy link

@fdwr fdwr Oct 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this relative to the end, so it still works when/if the spec adds leading dimensions (e.g. batch and channel)?

shapeA[rank(shapeA) - 1] // rank == sizeOfShape

Then you can probably reuse this logic for matmul too (since the default gemmOptions values would not be overwritten since matmul doesn't have alpha and beta anyway).


// https://webmachinelearning.github.io/webnn/#api-mlgraphbuilder-concat

const testConcat = async (operandType, syncFlag, ins, axis, expected) => {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ins is a parameter for inputs array like [[inputData, inputShape],].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to use more explicit name.

Copy link

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bruce for the updates.

[[24], expectedMinMax, {minValue: -5, maxValue: 7}],
[[24], expectedRelu6, {minValue: 0, maxValue: 6}],
[[24], expectedMinMaxBothPositive, {minValue: 1, maxValue: 8}],
[[24], expectedMinMaxBothNegative, {minValue: -6, maxValue: -1}],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) I see for any expected value, the same options always correspond. e.g.

[[24], expectedMinMaxBothNegative, {minValue: -6, maxValue: -1}],
...
[[4, 6], expectedMinMaxBothNegative, {minValue: -6, maxValue: -1}],
...
[[2, 3, 4], expectedMinMaxBothNegative, {minValue: -6, maxValue: -1}],

How about putting the options adjacent to their corresponding expected output? e.g.

// expected data by clamping input data within [0, 6]
relu6options = {minValue: 0, maxValue: 6};
const expectedRelu6 = {
  float32: [
    0,                  0,                  0,                 2.0879640579223633,
    0,                  0,                  5.071455478668213, 0,
    4.224577903747559,  6,                  0,                 0,
    5.5505242347717285, 1.2788677215576172, 6,                 6,
    0,                  1.7865902185440063, 5.564981460571289, 3.1451010704040527,
    0,                  0,                  6,                 0,
  ],
};

...
[[24], expectedRelu6, relu6options],
...
[[4, 6], expectedMinMaxBothNegative, relu6options],
...
[[2, 3, 4], expectedMinMaxBothNegative, relu6options],

It would be a little easier to update tests then, because you just have to update one place instead of a search and replace. Also the comment becomes mostly moot then "expected data by clamping input data within [0, 6]", since the options are visibly near.

namedInputs['input' + i] = new TestTypedArray(ins[i][0]);
}

const output = builder.concat(inputs, axis);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this creates an instance of the concat operator, I propose naming this variable "operator". The name "output" is confusing because it's not the concat tensor output, which isn't known until later. The same applies for clamp, const operator = builder.clamp(x, options) instead of y.

await context.computeAsync(graph, namedInputs, outputs);
}

assert_array_approx_equals_ulp(outputs.output, expected[0], PrecisionMetrics.ULP[operandType].concat, operandType);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all 3 of these are viable:

PrecisionMetrics.ULP[operandType].concat
PrecisionMetrics.concat.ULP[operandType]
PrecisionMetrics.concat[operandType].ULP

Though I tend to think in terms of the operator first, and then within that operator what tolerance it has. It feels kinda weird/backwards to think in terms of the kind of precision, and then ask what operator 🤔. e.g.

assert_array_approx_equals_ulp(outputs.output, expected[0], PrecisionMetrics.concat.ULP[operandType], operandType);

const PrecisionMetrics = {
  clamp:     {ULP:  {float32: 0, float16: 0},
  concat:    {ULP:  {float32: 0, float16: 0},
  relu:      {ULP:  {float32: 0, float16: 0},
  reshape:   {ULP:  {float32: 0, float16: 0},
  slice:     {ULP:  {float32: 0, float16: 0},
  split:     {ULP:  {float32: 0, float16: 0},
  squeeze:   {ULP:  {float32: 0, float16: 0},
  transpose: {ULP:  {float32: 0, float16: 0},
  tanh:      {ATOL: {float32: 1/1024, float16: 1/1024}}
  gemm:      {ULP:  {float32: gemmULPFunction, float16: gemmULPFunction}}
};

I also feel when a test case fails that finding a corresponding tolerance value by visually just scanning a single vertical list rather scan 3 possible lists is easier/faster (and more like a table/spreadsheet). The current approach saves some typing (fewer characters), but are there other advantages?

tanh: 1/1024,
},
},
// IEPOE - input elements per output element (depends on individual operator)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use a different term here based on how you're using it, because IEPOE is just the number of elements per input, but your callback function is actually returning a ULP (which is partly based on the IEPOE and also based on other non-tensor calculations like the scale and bias passed in the options). So maybe computedULP / ULPfunction / UlpFunction?

[
[[[1, 2, 3, 4], [2, 2]], [[5, 6, 7, 8], [2, 2]], [[9, 10, 11, 12], [2, 2]]],
[
[0, [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], [6, 2]]],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel putting the shape first is more intuitive, rather than hiding it at the end. e.g.

      [0, [6, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],

And then it would be consistent with clamp and slice which put the shape first then data:

    // 4D
    [[2, 3, 2, 2], expectedMin, {minValue: -5}],

Alternately, named parameters could help, as it's currently challenging to visually parse the square braces...

      {axis:0, shape:[6, 2], data:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]},

...like you did in GEMM ...alpha: alphaPositive, beta: betaNegative, aTranspose: true, bTranspose: false....

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, I see you have a number of other tests below that put the shape 2nd, which makes it a lot of work to change. Well, shrug 🤷.

0.905762209983811, -0.7598234598671647, -0.20336496018610095,
];
// C is scalar
const inputCPositiveScalar = 0.6817309442438475;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a value for C that is losslessly compatible with float16 (0.681640625), then you don't need to worry in the future about rounding issues of the option parameters.

I just use my little command line tool (https://github.com/fdwr/BiNums) to cast it.

binums float16 0.6817309442438475 -0.38121576376524624 0.6520302295684814 0.3881920576095581
...
       float16 0.681640625 (0x3974)
       float16 -0.381103515625 (0xB619)
       float16 0.65185546875 (0x3937)
       float16 0.38818359375 (0x3636)

[[1, 2, 4, 3], [0, 1, 2, 1], [1, 1, 2, 2], undefined, [1, 1, 2, 2], [20, 21, 23, 24],],
// 5D
[[1, 3, 2, 2, 2], [0, 1, 0, 1, 0], [1, 2, 2, 1, 2], undefined, [1, 2, 2, 1, 2], [11, 12, 15, 16, 19, 20, 23, 24,]],
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include one nop case, since we've encountered this edge case in models.

    'identity no operation': [
      // 1D
      [[24], [], [], [], [24], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,...]],
    ]

45, 46, 61, 62, 77, 78, 15, 16, 31, 32, 47, 48, 63, 64, 79, 80], [2, 2, 2, 10]],
],
],
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of your other test cases have 5D, but not this one?

@BruceDai
Copy link
Contributor Author

BruceDai commented Dec 1, 2022

Close it. Thanks all for your much reviewing this big PR. Now this PR has been separated smaller PRs according to your feedbacks and suggestions, please continue to help me review the later smaller PRs, thanks.

@BruceDai BruceDai closed this Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants