-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knn Imputer Class and dependency functionalities #4820
base: branch-23.02
Are you sure you want to change the base?
Knn Imputer Class and dependency functionalities #4820
Conversation
Fix forward merge rapidsai#4357 [skip-ci]
Implementing LinearSVM using the existing QN solvers. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Robert Maynard (https://github.com/robertmaynard) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4268
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#4293
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Closes rapidsai#3846 Adds support for exogenous variables to ARIMA. All series in the batch must have the same number of exogenous variables, and exogenous variables are not shared across the batch (`exog` therefore has `n_exog * batch_size` columns). Example: ```python model = ARIMA(endog=df_endog, exog=df_exog_past, order=(1,0,1), seasonal_order=(1,1,1,12), fit_intercept=True, simple_differencing=False) model.fit() fc, lower, upper = model.forecast(40, exog=df_exog_future, level=0.95) ```  Authors: - Louis Sugy (https://github.com/Nyrio) - Tamas Bela Feher (https://github.com/tfeher) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4221
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Addresses rapidsai#4110 This is an experimental prototype. For now, it supports: * XGBoost models with numerical splits * cuML RF regressors with numerical splits cuML RF classifiers are not supported. Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - William Hicks (https://github.com/wphicks) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4351
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Closes rapidsai#3805 Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4361
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
This upgrade is required to be in-line with: rapidsai/cudf#9716 Depends on: rapidsai/integration#390 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - Ray Douglass (https://github.com/raydouglass) URL: rapidsai#4372
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Fix Changelog Merge Conflicts for `branch-21.12`
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Changes to be in-line with: rapidsai/cudf#9734 Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4390
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
cc @robertmaynard @quasiben @raydouglass Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) URL: rapidsai#4392
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Authors: - Philip Hyunsu Cho (https://github.com/hcho3) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4398
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
…idsai#4400) PR uses project flash to build the cuML Python package mirroring what the C++ flow looks like. Note: Currently only changed for the CUDA 11.0 GPU test since that one uses Python 3.7, to do the other jobs we need to build the python package twice on the CPU job.
[gpuCI] Forward-merge branch-21.12 to branch-22.02 [skip gpuci]
Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - AJ Schmidt (https://github.com/ajschmidt8) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4396
…#4382) Suggest using LinearSVM when the user chooses to use the linear kernel in SVM. The reason is that LinearSVM uses a specialized faster solver. Closes rapidsai#1664 Also partially addresses rapidsai#2857 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4382
Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4373
…ai#4405) There were actuall 2 minor issues that prevented `UMAPAlgo::Optimize::find_params_ab()` from being ASAN-clean at the moment: - One is the mem leaks, of course - Another one is the `malloc()`-`delete` mismatch -- only memory allocated using `new` or equivalent should be freed with operator `delete` or `delete[]` Another issue that was also addressed here: exception safety (i.e., by using `make_unique` from C++-14) Signed-off-by: Yitao Li <yitao@rstudio.com> Authors: - Yitao Li (https://github.com/yitao-li) Approvers: - Zach Bjornson (https://github.com/zbjornson) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4405
P_sum is equal to n. See rapidsai#2622 where I made this change once before. rapidsai#4208 changed it back while consolidating code. Authors: - Zach Bjornson (https://github.com/zbjornson) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4425
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
rerun tests |
Fixes issue rapidsai#2387. For large data sizes, the batch size of the DBSCAN algorithm is small in order to fit the distance matrix in memory. This results in a matrix that has dimensions num_points x batch_size, both for the distance and adjacency matrix. The conversion of the boolean adjacency matrix to CSR format is performed in the 'adjgraph' step. This step was slow when the batch size was small, as described in issue rapidsai#2387. In this commit, the adjgraph step is sped up. This is done in two ways: 1. The adjacency matrix is now stored in row-major batch_size x num_points format --- it was transposed before. This required changes in the vertexdeg step. 2. The csr_row_op kernel has been replaced by the adj_to_csr kernel. This kernel can divide the work over multiple blocks even when the number of rows (batch size) is small. It makes optimal use of memory bandwidth because rows of the matrix are laid out contiguously in memory. Authors: - Allard Hendriksen (https://github.com/ahendriksen) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#4803
rerun tests |
This functionality has been moved to RAFT. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4829
…4804) This PR removes the naive versions of the DBSCAN algorithms. They were not used anymore and were largely incorrect, as described in rapidsai#3414. This fixes issue rapidsai#3414. Authors: - Allard Hendriksen (https://github.com/ahendriksen) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4804
rerun tests |
Pass `NVTX` option to raft in a more similar way to the other arguments and make sure `RAFT_NVTX` option in the installed `raft-config.cmake`. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Robert Maynard (https://github.com/robertmaynard) URL: rapidsai#4825
The conda recipe was updated to UCX 1.13.0 in rapidsai#4809 , but updating conda environment files was missing there. Authors: - Peter Andreas Entschev (https://github.com/pentschev) Approvers: - Jordan Jacobelli (https://github.com/Ethyling) URL: rapidsai#4813
Allows cuML to be installed with CuPy 11. xref: rapidsai/integration#508 Authors: - https://github.com/jakirkham Approvers: - Sevag H (https://github.com/sevagh) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4837
rerun tests |
1 similar comment
rerun tests |
Codecov ReportBase: 77.62% // Head: 78.24% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #4820 +/- ##
================================================
+ Coverage 77.62% 78.24% +0.61%
================================================
Files 180 181 +1
Lines 11384 11610 +226
================================================
+ Hits 8837 9084 +247
+ Misses 2547 2526 -21
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
This PR has been labeled |
3bc1de0
to
e7fd6cc
Compare
Merge PR : #4797 before merging this one. The functionalities required for this are in #4797
Created a draft PR and Added KNN Imputer class and dependency functionalities for imputation of missing values.
Supported Inputs: Numpy arrays, Pandas DataFrame, Cupy arrays, Cudf DataFrame
Tested on: Tesla T4 Single GPU
Time Latency:
Tested on numpy arrays with 25% of the data is masked, averaged the distance metric and set the column size to 100.
Data Points Cuml Sklearn
100000 0.513s 0.383s
1M 10.5s 36.1s
10M 105s 373s
Tested on numpy arrays with 1% of the data is masked, averaged the distance metric and set the column size to 100.
Data Points Cuml Sklearn
100000 0.217s 0.208s
1M 2.86s 7.73s
10M 10.2s 122s
Profiling on 1 million records:
Cupy in built functionalities are costing more time.