Knn Imputer Class and dependency functionalities #4820

SreekiranprasadV · 2022-07-18T16:26:39Z

Merge PR : #4797 before merging this one. The functionalities required for this are in #4797

Created a draft PR and Added KNN Imputer class and dependency functionalities for imputation of missing values.

Supported Inputs: Numpy arrays, Pandas DataFrame, Cupy arrays, Cudf DataFrame

Tested on: Tesla T4 Single GPU

Time Latency:

Tested on numpy arrays with 25% of the data is masked, averaged the distance metric and set the column size to 100.
Data Points Cuml Sklearn
100000 0.513s 0.383s
1M 10.5s 36.1s
10M 105s 373s

Tested on numpy arrays with 1% of the data is masked, averaged the distance metric and set the column size to 100.
Data Points Cuml Sklearn
100000 0.217s 0.208s
1M 2.86s 7.73s
10M 10.2s 122s

Profiling on 1 million records:

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.491    0.005    0.570    0.006 {method 'argpartition' of 'cupy._core.core.ndarray' objects}
     3561    0.197    0.000    0.213    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/rmm/rmm.py:212(rmm_cupy_allocator)
        1    0.149    0.149    1.078    1.078 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/cuml/_thirdparty/sklearn/preprocessing/_imputation.py:951(transform)
      2/1    0.087    0.044    0.161    0.161 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/cuml/internals/api_decorators.py:453(inner_with_getters)
        3    0.056    0.019    0.064    0.021 {method 'dot' of 'cupy._core.core.ndarray' objects}
      201    0.024    0.000    0.039    0.000 {method 'nonzero' of 'cupy._core.core.ndarray' objects}
      100    0.014    0.000    0.621    0.006 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/cuml/_thirdparty/sklearn/preprocessing/_imputation.py:863(_calc_impute)
      200    0.005    0.000    0.009    0.000 {built-in method cupy._core._routines_math._nansum}
     3562    0.005    0.000    0.010    0.000 cuda/cudart.pyx:10521(cudaGetDevice)
      101    0.004    0.000    0.009    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/cupy/_creation/ranges.py:9(arange)
     3562    0.004    0.000    0.014    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/rmm/_cuda/gpu.py:53(getDevice)
      200    0.004    0.000    0.008    0.000 {method 'take' of 'cupy._core.core.ndarray' objects}
      100    0.003    0.000    0.006    0.000 {method 'all' of 'cupy._core.core.ndarray' objects}
     3562    0.003    0.000    0.005    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/enum.py:358(__call__)
      103    0.003    0.000    0.005    0.000 {method 'any' of 'cupy._core.core.ndarray' objects}
      616    0.002    0.000    0.014    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/cupy/_creation/basic.py:7(empty)
     3561    0.002    0.000    0.002    0.000 {built-in method cupy.cuda.stream.get_current_stream}
     3562    0.002    0.000    0.002    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/enum.py:670(__new__)
     2107    0.002    0.000    0.004    0.000 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/numpy/core/numeric.py:1858(isscalar)
      7/1    0.002    0.000    1.080    1.080 /nvme/1/svadaga/miniconda3/envs/cuml_dev/lib/python3.9/site-packages/cuml/internals/api_decorators.py:357(inner)

Cupy in built functionalities are costing more time.

Fix forward merge rapidsai#4357 [skip-ci]

Implementing LinearSVM using the existing QN solvers. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Robert Maynard (https://github.com/robertmaynard) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4268