Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial input for backend selection #2396

Merged
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
bd08406
Initial input for backend selection
May 30, 2023
32f71af
Update dev/make/cmplr.gnu.mkl.mk
amgrigoriev Jun 2, 2023
4173888
Update dev/make/cmplr.gnu.ref.mk
amgrigoriev Jun 2, 2023
3b8d239
Merge branch 'master' into dev/agrigorev-backend-selection
Jun 2, 2023
ed7ce18
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jun 2, 2023
aa55677
Changed default backend to 'mkl'
Jun 2, 2023
193bb58
Buildable onedal_c
Jun 6, 2023
cc3b2fd
Added clang support
Jun 6, 2023
18225a8
Update dev/make/cmplr.gnu.mkl.mk
amgrigoriev Jun 6, 2023
39049aa
Update dev/make/cmplr.gnu.ref.mk
amgrigoriev Jun 6, 2023
a531c0f
Compiler fixes for icc, icx, vc plus clang-format
Jun 6, 2023
137771c
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jun 6, 2023
8f366b7
Update dev/make/cmplr.clang.ref.mk
amgrigoriev Jun 6, 2023
3ef7d08
adding support for selecting different math/rng/service backends in c…
Jun 22, 2023
34f6708
fixed build issue with kmeans serialization
Jun 22, 2023
1224a3f
Merge pull request #1 from makart19/dev/amk_bazel_backend_selection
amgrigoriev Jun 22, 2023
336e772
Merge branch 'master' into dev/agrigorev-backend-selection
napetrov Jun 27, 2023
d7b46ea
Introducing backend_config param selection to build.sh
napetrov Jun 27, 2023
f2ae2cf
Create openblas.sh
napetrov Jun 27, 2023
4c95148
Introduce CI build for BLAS backend
napetrov Jun 27, 2023
7463322
adding execute permission on openblas.sh
napetrov Jun 28, 2023
fd38304
Update cpp/daal/src/externals/config_ref.h
amgrigoriev Jul 10, 2023
2ed525f
Update cpp/daal/src/externals/config_ref.h
amgrigoriev Jul 10, 2023
1ccbe05
Addressed part of the comments
Jul 11, 2023
6f294aa
Clang-format
Jul 11, 2023
6fb3617
Turned off hyperthreadig for ref config in order to use TBB default n…
Jul 11, 2023
7049297
Addressed more comments
Jul 12, 2023
3565f22
Macro fixed (APPLE)
Jul 12, 2023
126a48b
More changes in REF RNG
Jul 13, 2023
3614c75
Update build.sh
napetrov Jul 13, 2023
0657a7e
Removed 'sed' from Makefile
Jul 13, 2023
2154d93
Merge branch 'master' into dev/agrigorev-backend-selection
Jul 13, 2023
8e7b69e
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jul 13, 2023
1f53810
Update openblas.sh
napetrov Jul 13, 2023
6eae7d6
Removed config_template from BAZEL
Jul 13, 2023
2470da8
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jul 13, 2023
22fcdb8
Removed backend_config_header from BAZEL
Jul 13, 2023
559c440
Update openblas.sh
napetrov Jul 13, 2023
438710b
Replaced safe function not supported by GNU
Jul 14, 2023
55d1d60
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jul 14, 2023
62b8ece
Fixed bugs in ref backend for OpenBLAS build
Jul 14, 2023
c7b5906
Fixed bugs in ref backend for OpenBLAS build #2
Jul 14, 2023
d18b453
Reduced header file dependencies in REF backend; removed << operator …
Jul 14, 2023
a4b7f2f
Update ci.yml
napetrov Jul 14, 2023
518fc92
Fixed export on symbols for OpenBLAS build
Jul 15, 2023
29b8314
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jul 15, 2023
5066010
export.def handling in bazel
Jul 15, 2023
409c945
Update cpp/daal/src/externals/service_math_ref.h
amgrigoriev Jul 29, 2023
41dd3c8
Added libfgortran to REF build
Jul 31, 2023
6b19917
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Jul 31, 2023
4b994fe
removed config_template auxiliary func
Aug 1, 2023
1149bf3
Merge pull request #2 from makart19/dev/upd_bazel_cfg_templ_appr
amgrigoriev Aug 1, 2023
94b5b88
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Aug 1, 2023
d94ae6a
Removed libgfortran for REF backend
Aug 1, 2023
dc06a4c
Merge branch 'master' into dev/agrigorev-backend-selection
Aug 1, 2023
240d4a0
Fixed BACKEND incdirs for oneAPI; addressed some comments
Aug 8, 2023
f539a75
Removed unnecessary includes; fixed ifdef in _DECLAR_ files
Aug 8, 2023
3e6fdeb
fixed omitted ifdef in _DECLAR_ files
Aug 8, 2023
6461eed
Clang-format
Aug 8, 2023
f45f762
Clang-format fix
Aug 9, 2023
44be116
Added an option to move compression to exclude list for examples
Aug 9, 2023
61b30fb
Excluded compression examples for all configurations
Aug 14, 2023
79bc570
revert some macros to __intel_compiler
Aug 15, 2023
a8395ab
replase some more macros
Aug 15, 2023
2f8fb63
add NO_FORTRAN=1 to openblas.sh script
Aug 15, 2023
49bd0eb
Merge pull request #3 from amgrigoriev/dev/pyakovlev-fix-macros
amgrigoriev Aug 15, 2023
e0d3dcc
Update cpp/daal/src/externals/service_service_ref.h
napetrov Aug 15, 2023
67d6a43
Fixed missed fpk symbols for oneapi examples (REF backend)
Aug 15, 2023
fb37075
Merge branch 'dev/agrigorev-backend-selection' of https://github.com/…
Aug 15, 2023
865e128
Update cpp/daal/src/externals/service_service_ref.h
amgrigoriev Aug 16, 2023
3dca371
Update cpp/daal/src/externals/service_service_ref.h
amgrigoriev Aug 16, 2023
693235f
Merge branch 'master' into dev/agrigorev-backend-selection
Aug 17, 2023
8d5ee02
Fixed error message in oneapi
Aug 17, 2023
be4e3f9
exclude failed examples with ref backend
Aug 17, 2023
5e5d405
Merge pull request #4 from amgrigoriev/dev/pyakovlev-add-exclude-exam…
amgrigoriev Aug 17, 2023
f73e50c
exclude examples for oneapi/cpp ifaces
Aug 18, 2023
2f9334a
Merge pull request #5 from amgrigoriev/dev/pyakovlev-exclude-oneapi-e…
amgrigoriev Aug 18, 2023
85e22d0
exclude mpi examples for ref backend
Aug 18, 2023
8c817a6
Merge pull request #6 from amgrigoriev/dev/pyakovlev-add-exclude-exam…
amgrigoriev Aug 18, 2023
1315182
Apply suggestions from code review
napetrov Aug 18, 2023
7ecf32c
AVX512_MIC cleanup
Aug 20, 2023
55de110
Merge branch 'master' into dev/agrigorev-backend-selection
Aug 20, 2023
9e6ee5a
Fixed CI pipeline
Aug 20, 2023
0c7c3e0
Added more includes for ONEAPI
Aug 20, 2023
a09c3ef
Update .ci/pipeline/ci.yml
napetrov Aug 21, 2023
243b094
Switch to core count for blas build
napetrov Aug 21, 2023
8fe15f0
Update .ci/env/openblas.sh
napetrov Aug 21, 2023
2858360
Adding _MKL suffix for job name
napetrov Aug 21, 2023
ee5458f
Fixing daal4py job dependency
napetrov Aug 21, 2023
7f127df
Merge branch 'master' into dev/agrigorev-backend-selection
Aug 24, 2023
d642dac
Merge branch 'master' into dev/agrigorev-backend-selection
Aug 25, 2023
31416b2
Attempt to fix warnings
Aug 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions cpp/daal/include/services/daal_defines.h
Original file line number Diff line number Diff line change
Expand Up @@ -491,13 +491,30 @@ const int SERIALIZATION_DBSCAN_DISTRIBUTED_PARTIAL_RESULT_STEP13_ID = 121310;
} \
}

#define DAAL_OVERFLOW_CHECK_BY_MULTIPLICATION_THROW_IF_POSSIBLE(type, op1, op2) \
{ \
if (!(0 == (op1)) && !(0 == (op2))) \
{ \
volatile type r = (op1) * (op2); \
r /= (op1); \
if (!(r == (op2))) services::throwIfPossible(services::Status(services::ErrorBufferSizeIntegerOverflow)); \
} \
}

#define DAAL_OVERFLOW_CHECK_BY_ADDING(type, op1, op2) \
{ \
volatile type r = (op1) + (op2); \
r -= (op1); \
if (!(r == (op2))) return services::Status(services::ErrorBufferSizeIntegerOverflow); \
}

#define DAAL_OVERFLOW_CHECK_BY_ADDING_THROW_IF_POSSIBLE(type, op1, op2) \
{ \
volatile type r = (op1) + (op2); \
r -= (op1); \
if (!(r == (op2))) services::throwIfPossible(services::Status(services::ErrorBufferSizeIntegerOverflow)); \
}

#define DAAL_CHECK_STATUS_RETURN_IF_FAIL(statVal, returnObj) \
{ \
if (!(statVal)) return returnObj; \
Expand Down
2 changes: 1 addition & 1 deletion cpp/daal/src/algorithms/adaboost/adaboost_predict_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ services::Status AdaBoostPredictKernel<method, algorithmFPType, cpu>::processBlo
}
}

Math<algorithmFPType, cpu>::vLog(nRowsInCurrentBlock * nClasses, p_block, pLog); // inplace
MathInst<algorithmFPType, cpu>::vLog(nRowsInCurrentBlock * nClasses, p_block, pLog); // inplace

service_memset<algorithmFPType, cpu>(pSumLog, 0.0, nRowsInCurrentBlock);

Expand Down
6 changes: 3 additions & 3 deletions cpp/daal/src/algorithms/adaboost/adaboost_train_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -181,14 +181,14 @@ services::Status AdaBoostTrainKernel<method, algorithmFPType, cpu>::adaboostSAMM
}

algorithmFPType cM =
learningRate * (Math<algorithmFPType, cpu>::sLog((one - errM) / errM) + Math<algorithmFPType, cpu>::sLog(nClasses - one));
learningRate * (MathInst<algorithmFPType, cpu>::sLog((one - errM) / errM) + MathInst<algorithmFPType, cpu>::sLog(nClasses - one));

/* Update weights */
for (size_t i = 0; i < nVectors; i++)
{
errFlag[i] *= cM;
}
Math<algorithmFPType, cpu>::vExp(nVectors, errFlag, errFlag);
MathInst<algorithmFPType, cpu>::vExp(nVectors, errFlag, errFlag);
algorithmFPType wSum = zero;
for (size_t i = 0; i < nVectors; i++)
{
Expand Down Expand Up @@ -337,7 +337,7 @@ services::Status AdaBoostTrainKernel<method, algorithmFPType, cpu>::adaboostSAMM
}
t[i] *= scaling;
}
Math<algorithmFPType, cpu>::vExp(nVectors, t, t);
MathInst<algorithmFPType, cpu>::vExp(nVectors, t, t);
for (size_t i = 0; i < nVectors; i++)
{
w[i] *= t[i];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Status AssociationRulesKernel<apriori, algorithmFPType, cpu>::compute(const Nume
/* Find "large" itemsets */
size_t L_size = 0;
size_t maxItemsetSize = ((parameter->maxItemsetSize == 0) ? (size_t)-1 : parameter->maxItemsetSize);
double ceil = daal::internal::Math<double, cpu>::sCeil(minSupport * data.numOfTransactions);
double ceil = daal::internal::MathInst<double, cpu>::sCeil(minSupport * data.numOfTransactions);
DAAL_ASSERT(ceil >= 0)
services::Status statLargeItemset = findLargeItemsets((size_t)ceil, maxItemsetSize, data, L.get(), L_size);
DAAL_CHECK_STATUS_OK(statLargeItemset.ok(), statLargeItemset);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ struct assocrules_dataset
supportVals[itemID[i]]++;
}
numOfUniqueItems = 0;
double ceil = daal::internal::Math<double, cpu>::sCeil(minSupport * numOfTransactions);
double ceil = daal::internal::MathInst<double, cpu>::sCeil(minSupport * numOfTransactions);
DAAL_ASSERT(ceil >= 0)

size_t iMinSupport = (size_t)ceil;
Expand Down
4 changes: 2 additions & 2 deletions cpp/daal/src/algorithms/brownboost/brownboost_predict_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -116,14 +116,14 @@ services::Status BrownBoostPredictKernel<method, algorithmFPType, cpu>::compute(
const algorithmFPType zero = (algorithmFPType)0.0;
if (error != zero)
{
algorithmFPType sqrtC = daal::internal::Math<algorithmFPType, cpu>::sErfInv(algorithmFPType(1.0) - error);
algorithmFPType sqrtC = daal::internal::MathInst<algorithmFPType, cpu>::sErfInv(algorithmFPType(1.0) - error);
algorithmFPType invSqrtC = algorithmFPType(1.0) / sqrtC;
for (size_t j = 0; j < nVectors; j++)
{
r[j] *= invSqrtC;
}
}
daal::internal::Math<algorithmFPType, cpu>::vErf(nVectors, r, r);
daal::internal::MathInst<algorithmFPType, cpu>::vErf(nVectors, r, r);
return s;
}

Expand Down
18 changes: 9 additions & 9 deletions cpp/daal/src/algorithms/brownboost/brownboost_train_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -254,8 +254,8 @@ void BrownBoostTrainKernel<method, algorithmFPType, cpu>::updateWeights(size_t n
nre2[j] = nra[j] * invSqrtC;
w[j] = -nra[j] * nra[j] / c;
}
daal::internal::Math<algorithmFPType, cpu>::vExp(nVectors, w, w);
daal::internal::Math<algorithmFPType, cpu>::vErf(nVectors, nre2, nre2);
daal::internal::MathInst<algorithmFPType, cpu>::vExp(nVectors, w, w);
daal::internal::MathInst<algorithmFPType, cpu>::vErf(nVectors, nre2, nre2);
algorithmFPType wSum = (algorithmFPType)0.0;
for (size_t j = 0; j < nVectors; j++)
{
Expand Down Expand Up @@ -309,11 +309,11 @@ NewtonRaphsonKernel<method, algorithmFPType, cpu>::NewtonRaphsonKernel(size_t nV
{
const algorithmFPType one = (algorithmFPType)1.0;
const algorithmFPType pi = (algorithmFPType)3.1415926535897932384626433832795;
sqrtC = daal::internal::Math<algorithmFPType, cpu>::sErfInv(one - error);
sqrtC = daal::internal::MathInst<algorithmFPType, cpu>::sErfInv(one - error);
c = sqrtC * sqrtC;
invC = one / c;
invSqrtC = one / sqrtC;
sqrtPiC = daal::internal::Math<algorithmFPType, cpu>::sSqrt(pi * c);
sqrtPiC = daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(pi * c);
}

template <Method method, typename algorithmFPType, CpuType cpu>
Expand Down Expand Up @@ -360,8 +360,8 @@ void NewtonRaphsonKernel<method, algorithmFPType, cpu>::compute(algorithmFPType
nrw[j] = -invC * nrd[j] * nrd[j];
nre1[j] = nrd[j] * invSqrtC;
}
daal::internal::Math<algorithmFPType, cpu>::vExp(nVectors, nrw, nrw);
daal::internal::Math<algorithmFPType, cpu>::vErf(nVectors, nre1, nre1);
daal::internal::MathInst<algorithmFPType, cpu>::vExp(nVectors, nrw, nrw);
daal::internal::MathInst<algorithmFPType, cpu>::vErf(nVectors, nre1, nre1);
algorithmFPType nrW(0.0);
algorithmFPType nrU(0.0);
algorithmFPType nrB(0.0);
Expand All @@ -383,9 +383,9 @@ void NewtonRaphsonKernel<method, algorithmFPType, cpu>::compute(algorithmFPType
nrAlpha += invDenom * (c * nrW * nrB + sqrtPiC * nrU * nrE);
nrT += invDenom * (c * nrB * nrB + sqrtPiC * nrV * nrE);

if ((daal::internal::Math<algorithmFPType, cpu>::sFabs(nrB / nrW) <= nu)
|| (daal::internal::Math<algorithmFPType, cpu>::sFabs(nrB) <= nrAccuracy
&& daal::internal::Math<algorithmFPType, cpu>::sFabs(nrE) <= nrAccuracy))
if ((daal::internal::MathInst<algorithmFPType, cpu>::sFabs(nrB / nrW) <= nu)
|| (daal::internal::MathInst<algorithmFPType, cpu>::sFabs(nrB) <= nrAccuracy
&& daal::internal::MathInst<algorithmFPType, cpu>::sFabs(nrE) <= nrAccuracy))
break;
}
nrAlpha *= alphaSign;
Expand Down
4 changes: 2 additions & 2 deletions cpp/daal/src/algorithms/cholesky/cholesky_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,11 @@ Status CholeskyKernel<algorithmFPType, method, cpu>::performCholesky(NumericTabl

if (isFull<algorithmFPType, cpu>(rLayout))
{
Lapack<algorithmFPType, cpu>::xpotrf(&uplo, &dims, pL, &dims, &info);
LapackInst<algorithmFPType, cpu>::xpotrf(&uplo, &dims, pL, &dims, &info);
}
else if (rLayout == NumericTableIface::lowerPackedTriangularMatrix)
{
Lapack<algorithmFPType, cpu>::xpptrf(&uplo, &dims, pL, &info);
LapackInst<algorithmFPType, cpu>::xpptrf(&uplo, &dims, pL, &info);
}
else
{
Expand Down
10 changes: 5 additions & 5 deletions cpp/daal/src/algorithms/cordistance/cordistance_full_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ services::Status corDistanceFull(const NumericTable * xTable, NumericTable * rTa
DAAL_INT m = blockSize1, k = 1, nn = blockSize1;
DAAL_INT lda = m, ldb = nn, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum, &lda, sum, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum, &lda, sum, &ldb, &beta, buf, &ldc);

/* calculate x * x^t - 1/p * sum^t * sum */
alpha = one;
Expand All @@ -95,14 +95,14 @@ services::Status corDistanceFull(const NumericTable * xTable, NumericTable * rTa
m = blockSize1, k = p, nn = blockSize1;
lda = k, ldb = k, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);

PRAGMA_VECTOR_ALWAYS
for (size_t i = 0; i < blockSize1; i++)
{
if (buf[i * blockSize1 + i] > (algorithmFPType)0.0)
{
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::Math<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
}
}

Expand Down Expand Up @@ -212,7 +212,7 @@ services::Status corDistanceFull(const NumericTable * xTable, NumericTable * rTa
DAAL_INT m = blockSize2, k = 1, nn = blockSize1;
DAAL_INT lda = m, ldb = nn, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum2, &lda, sum1l, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum2, &lda, sum1l, &ldb, &beta, buf, &ldc);

/* calculate x1 * x2^t - 1/p * sum1^t * sum2 */
alpha = one;
Expand All @@ -226,7 +226,7 @@ services::Status corDistanceFull(const NumericTable * xTable, NumericTable * rTa
ldb = k;
ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);

for (size_t i = 0; i < blockSize1; i++)
{
Expand Down
10 changes: 5 additions & 5 deletions cpp/daal/src/algorithms/cordistance/cordistance_lp_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ services::Status corDistanceLowerPacked(const NumericTable * xTable, NumericTabl
DAAL_INT m = blockSize1, k = 1, nn = blockSize1;
DAAL_INT lda = m, ldb = nn, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum, &lda, sum, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum, &lda, sum, &ldb, &beta, buf, &ldc);

/* calculate x * x^t - 1/p * sum^t * sum */
alpha = one;
Expand All @@ -93,15 +93,15 @@ services::Status corDistanceLowerPacked(const NumericTable * xTable, NumericTabl
ldb = k;
ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);

/* compute inverse of sqrt of gemm result and save for use in computation off-diagonal blocks */
PRAGMA_VECTOR_ALWAYS
for (size_t i = 0; i < blockSize1; i++)
{
if (buf[i * blockSize1 + i] > (algorithmFPType)0.0)
{
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::Math<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
}
}

Expand Down Expand Up @@ -226,7 +226,7 @@ services::Status corDistanceLowerPacked(const NumericTable * xTable, NumericTabl
DAAL_INT m = blockSize2, k = 1, nn = blockSize1;
DAAL_INT lda = m, ldb = nn, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum2, &lda, sum1l, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum2, &lda, sum1l, &ldb, &beta, buf, &ldc);

/* calculate x1 * x2^t - 1/p * sum1^t * sum2 */
alpha = one;
Expand All @@ -241,7 +241,7 @@ services::Status corDistanceLowerPacked(const NumericTable * xTable, NumericTabl
ldc = m;

/* compute the distance between k1 and k2 blocks of rows in the input dataset */
Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);

for (size_t i = 0; i < blockSize1; i++)
{
Expand Down
10 changes: 5 additions & 5 deletions cpp/daal/src/algorithms/cordistance/cordistance_up_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ services::Status corDistanceUpperPacked(const NumericTable * xTable, NumericTabl
DAAL_INT m = blockSize1, k = 1, nn = blockSize1;
DAAL_INT lda = m, ldb = nn, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum, &lda, sum, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum, &lda, sum, &ldb, &beta, buf, &ldc);

/* calculate x * x^t - 1/p * sum^t * sum */
alpha = one;
Expand All @@ -93,15 +93,15 @@ services::Status corDistanceUpperPacked(const NumericTable * xTable, NumericTabl
ldb = k;
ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);

/* compute inverse of sqrt of gemm result and save for use in computation off-diagonal blocks */
PRAGMA_VECTOR_ALWAYS
for (size_t i = 0; i < blockSize1; i++)
{
if (buf[i * blockSize1 + i] > (algorithmFPType)0.0)
{
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::Math<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
}
}

Expand Down Expand Up @@ -225,7 +225,7 @@ services::Status corDistanceUpperPacked(const NumericTable * xTable, NumericTabl
DAAL_INT m = blockSize2, k = 1, nn = blockSize1;
DAAL_INT lda = m, ldb = nn, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum2, &lda, sum1l, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, sum2, &lda, sum1l, &ldb, &beta, buf, &ldc);

/* calculate x1 * x2^t - 1/p * sum1^t * sum2 */
alpha = one;
Expand All @@ -240,7 +240,7 @@ services::Status corDistanceUpperPacked(const NumericTable * xTable, NumericTabl
ldc = m;

/* compute the distance between k1 and k2 blocks of rows in the input dataset */
Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);

for (size_t i = 0; i < blockSize1; i++)
{
Expand Down
6 changes: 3 additions & 3 deletions cpp/daal/src/algorithms/cosdistance/cosdistance_full_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,14 @@ services::Status cosDistanceFull(const NumericTable * xTable, NumericTable * rTa
DAAL_INT m = blockSize1, k = p, nn = blockSize1;
DAAL_INT lda = k, ldb = p, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);

PRAGMA_VECTOR_ALWAYS
for (size_t i = 0; i < blockSize1; i++)
{
if (buf[i * blockSize1 + i] > (algorithmFPType)0.0)
{
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::Math<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
}
}

Expand Down Expand Up @@ -154,7 +154,7 @@ services::Status cosDistanceFull(const NumericTable * xTable, NumericTable * rTa
DAAL_INT m = blockSize2, k = p, nn = blockSize1;
DAAL_INT lda = k, ldb = p, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);

for (size_t i = 0; i < blockSize1; i++)
{
Expand Down
6 changes: 3 additions & 3 deletions cpp/daal/src/algorithms/cosdistance/cosdistance_lp_impl.i
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,15 @@ services::Status cosDistanceLowerPacked(const NumericTable * xTable, NumericTabl
DAAL_INT m = blockSize1, k = p, nn = blockSize1;
DAAL_INT lda = k, ldb = p, ldc = m;

Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x, &lda, x, &ldb, &beta, buf, &ldc);

/* compute inverse of sqrt of gemm result and save for use in computation off-diagonal blocks */
PRAGMA_VECTOR_ALWAYS
for (DAAL_INT i = 0; i < blockSize1; i++)
{
if (buf[i * blockSize1 + i] > (algorithmFPType)0.0)
{
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::Math<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
buf[i * blockSize1 + i] = (algorithmFPType)1.0 / daal::internal::MathInst<algorithmFPType, cpu>::sSqrt(buf[i * blockSize1 + i]);
}
}

Expand Down Expand Up @@ -167,7 +167,7 @@ services::Status cosDistanceLowerPacked(const NumericTable * xTable, NumericTabl
DAAL_INT lda = k, ldb = p, ldc = m;

/* compute the distance between k1 and k2 blocks of rows in the input dataset */
Blas<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);
BlasInst<algorithmFPType, cpu>::xxgemm(&transa, &transb, &m, &nn, &k, &alpha, x2, &lda, x1, &ldb, &beta, buf, &ldc);

for (size_t i = 0; i < blockSize1; i++)
{
Expand Down
Loading