Modeling the logic utilization of Haddoc2 generated mappings. We rely on linear models to predict the logic resource (reported in ALMs) generated by the SCM (Multipliers) and MOA (Adders) parts. The inputs of this linear model are metrics that are computed directly from the topology and weights of a given CNN. These metrics are:
-
nb_nullNumber of null values in a 3D given convolution kernel -
nb_pow2Number of weights that are equal to a power of two in a 3D given convolution kernel. The multiplication by these weights is implemented by means of shift registers, which are less resources consuming than multipliers. -
nb_bit1Number of bits that are set to one in a given 3D convolution kernel. Intuitively, higher is this number, higher are the resource utilization. -
nb_efbw: With the metrics above, we were not able to accurately predict the hardware resources, especially the adder parts, so we came up with this gem. In fact, the accumulation of partial products in Haddoc2 is achieved with a MOA that inputs multiple operands with variable bitwidths. The circuitry of such an adder has complexity that is correlated to the number of inputs, but also to the numerical dynamic of the partial sums. To illustrate this, let's consider the example of a dot-product of a vectorxwith a weight vectorwsuch asw = [2 0 18 256]and let's suppose weights and inputs are represented in an 8 bits fixed point format.- The multiplication by the first coefficient can be implemented with a shift register and the resulting partial product
p[0] = x[0] * w[0]requires8+mcl(2) = 9 bitsto be represented, wheremcl(x) = max(ceil(log2(x))) - The multiplication by the second coefficient is skipped and does not generate any partial product.
- The multiplication by the third coefficient requires
8+mcl(18) = 13 bitsto be represented. - The multiplication by the last coefficient is implemented by means of shift register and the partial product requires
8+mcl(256) = 16 bitsto be represented. - Finally, the accumulation of these partial terms is achieved with a MOA that inputs respectively 9, 13 and 16 bits. The circuitry of this adder has thus a complexity that is correlated to the number of partial products and their numerical dynamic, which in turn is related to the numerical dynamic of the 3D convolution kernel weights. The
nb_efbwof a given kernel is defined as:nb_efbw = sum(bw_in + mcl(bw_theta)).
We found that this
nb_efbwmetric is the most pertinent to model the hardware resource models, as shown in the following table, whereR_squaredscores of the models with different features are reported. TheGLMstantds for the Generalized Linear Model in which all the four previous features are associated to model the resource usage. - The multiplication by the first coefficient can be implemented with a shift register and the resulting partial product
| MOA | Alexnet | Squeezenet | Alexnet-Comp. |
|---|---|---|---|
| nb_null | 0.7345 | ||
| nb_pow2 | 0.3722 | 0.3851 | |
| nb_bit1 | 0.6589 | 0.5779 | 0.6744 |
| nb_efbw | 0.7759 | 0.7109 | 0.7784 |
| GLM | 0.8139 | 0.7372 | 0.8105 |
| SCM | Alexnet | Squeezenet | Alexnet-Comp. |
|---|---|---|---|
| nb_null | 0.7345 | ||
| nb_pow2 | 0.2230 | 0.2250 | |
| nb_bit1 | 0.5884 | 0.5070 | 0.6098 |
| nb_efbw | 0.7262 | 0.5906 | 0.7481 |
| GLM | 0.8010 | 0.6902 | 0.8328 |