Skip to content

[BUG] SVR performance problem with linear model #1664

Closed
@tfeher

Description

@tfeher

Describe the bug
Support vector regression has a performance problem if:

  • we use linear kernel, and
  • it is possible to fit the data with a linear model (e.g. created by make_regression)

The execution time can become very long in these cases, longer than Sklearn's single core LIBSVM method.

Steps/Code to reproduce bug

import cuml.svm 
from sklearn.datasets import make_regression
from timeit import default_timer

X, y = make_regression(n_samples=1300, n_features=200, n_informative=200, random_state=1378)
cumlSVR = cuml.svm.SVR(kernel='linear', gamma='scale', verbose=True)
start = default_timer()
cumlSVR.fit(X, y)
print('Time to fit {:4.1f} s'.format(default_timer()-start))

Output:

SMO solver finished after 237 outer iterations, 2304255 total inner iterations, and diff 0.00099707
Time to fit 27.9 s

The execuction time was measured on a V100. From the number of iterations we can see that most of the inner iterations run until we reach the max_inner_iter limit in SmoBlockSolve.

Expected behavior
For the given problem size a significantly faster execution time is expected. For example fitting the following dataset

X, y = make_friedman1(n_samples=1250, n_features=200, random_state=13745)

with'poly' kernel takes around 0.1 sec, with linear kernel it is around 1 sec. (Note however that this dataset has only 5 informative features).

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Ubuntu 18.04 amd64
  • GPU Model/Driver:V100 and driver 440.33.01
  • CUDA: 10.1
  • Method of cuDF & cuML install: source
    • If method of install is from source, using cmake 3.14.5 & gcc/g++ 7.3.0 and commit hash 1b6f141 (branch-0.13)

Additional context

  • Comparing the accuray with Sklearn's SVR, they agree up to (or close to) machine precision.
  • Even with linear kernel we can be fast if the data is not linearly separable.
  • SVC seem to work fine: all the examples that I tested run fast even with large number of features.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions