Description
Describe the bug
Support vector regression has a performance problem if:
- we use linear kernel, and
- it is possible to fit the data with a linear model (e.g. created by make_regression)
The execution time can become very long in these cases, longer than Sklearn's single core LIBSVM method.
Steps/Code to reproduce bug
import cuml.svm
from sklearn.datasets import make_regression
from timeit import default_timer
X, y = make_regression(n_samples=1300, n_features=200, n_informative=200, random_state=1378)
cumlSVR = cuml.svm.SVR(kernel='linear', gamma='scale', verbose=True)
start = default_timer()
cumlSVR.fit(X, y)
print('Time to fit {:4.1f} s'.format(default_timer()-start))
Output:
SMO solver finished after 237 outer iterations, 2304255 total inner iterations, and diff 0.00099707
Time to fit 27.9 s
The execuction time was measured on a V100. From the number of iterations we can see that most of the inner iterations run until we reach the max_inner_iter limit in SmoBlockSolve.
Expected behavior
For the given problem size a significantly faster execution time is expected. For example fitting the following dataset
X, y = make_friedman1(n_samples=1250, n_features=200, random_state=13745)
with'poly'
kernel takes around 0.1 sec, with linear kernel it is around 1 sec. (Note however that this dataset has only 5 informative features).
Environment details (please complete the following information):
- Environment location: Bare-metal
- Linux Distro/Architecture: Ubuntu 18.04 amd64
- GPU Model/Driver:V100 and driver 440.33.01
- CUDA: 10.1
- Method of cuDF & cuML install: source
- If method of install is from source, using
cmake
3.14.5 &gcc/g++
7.3.0 and commit hash 1b6f141 (branch-0.13)
- If method of install is from source, using
Additional context
- Comparing the accuray with Sklearn's SVR, they agree up to (or close to) machine precision.
- Even with linear kernel we can be fast if the data is not linearly separable.
- SVC seem to work fine: all the examples that I tested run fast even with large number of features.