[REVIEW] Optimizations for `cudf.concat` when `axis=1` #9333

galipremsagar · 2021-09-29T03:57:41Z

This PR:

Reduces memory pressure by avoiding index materialization incase of RangeIndex when axis=1.
Fixes the correctness of all axis=1 cases in cudf.concat, and thus enabling stricter index type checks in associated pytests.
Cache distinct_count value of Column in _distinct_count to improve performance.
Introduced Column._clear_cache to have a single method that clears all the caches values related to a Column.
Implemented Index.union, Index.intersection & Index.has_duplicates.
Implemented is_numeric, is_boolean, is_integer, is_floating, is_object, is_categorical& is_interval APIs in Index.
Optimizes cudf.concat for axis=1 by utilizing above mentioned changes, here are benchmarks:

------------------------------------------------------------------------------ benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs0]': 2 tests -------------------------------------------------------------------------------
Name (time in us)                                                 Min                   Max                  Mean              StdDev                Median                IQR            Outliers         OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs0] (THIS-PR)            209.9802 (1.0)      2,429.9941 (1.0)        222.9479 (1.0)       41.3467 (1.0)        224.5191 (1.0)      12.1914 (1.81)        12;32  4,485.3529 (1.0)        2985           1
test_concat_axis_1[False-inner-1-objs0] (branch-21.12)     1,807.7570 (8.61)     5,023.1239 (2.07)     1,868.9510 (8.38)     246.0487 (5.95)     1,830.1200 (8.15)      6.7296 (1.0)         20;74    535.0595 (0.12)        520           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs1]': 2 tests ----------------------------------------------------------------------
Name (time in ms)                                              Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs1] (THIS-PR)          19.3856 (1.0)      25.1846 (1.0)      19.7466 (1.0)      0.9687 (13.33)    19.5381 (1.0)      0.2784 (6.09)          2;2  50.6416 (1.0)          50           1
test_concat_axis_1[False-inner-1-objs1] (branch-21.12)     30.7169 (1.58)     31.1239 (1.24)     30.7672 (1.56)     0.0727 (1.0)      30.7480 (1.57)     0.0457 (1.0)           2;1  32.5021 (0.64)         33           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs2]': 2 tests ----------------------------------------------------------------------
Name (time in ms)                                              Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs2] (THIS-PR)          19.4794 (1.0)      20.0249 (1.0)      19.5933 (1.0)      0.1462 (1.0)      19.5117 (1.0)      0.1412 (1.07)         10;4  51.0378 (1.0)          51           1
test_concat_axis_1[False-inner-1-objs2] (branch-21.12)     30.8203 (1.58)     31.9644 (1.60)     30.9485 (1.58)     0.1959 (1.34)     30.9026 (1.58)     0.1319 (1.0)           1;1  32.3118 (0.63)         33           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs3]': 2 tests ----------------------------------------------------------------------
Name (time in ms)                                              Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs3] (THIS-PR)           1.2168 (1.0)       3.3944 (1.0)       1.2505 (1.0)      0.0893 (1.0)       1.2349 (1.0)      0.0388 (1.0)         15;23  799.6555 (1.0)         707           1
test_concat_axis_1[False-inner-1-objs3] (branch-21.12)     44.4625 (36.54)    45.9180 (13.53)    45.1017 (36.07)    0.3472 (3.89)     45.1007 (36.52)    0.4618 (11.90)         7;0   22.1721 (0.03)         23           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs4]': 2 tests ------------------------------------------------------------------------
Name (time in ms)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs4] (branch-21.12)      95.7450 (1.0)       97.5205 (1.0)       96.5405 (1.0)      0.5931 (1.13)      96.5431 (1.0)      1.0256 (1.17)          4;0  10.3583 (1.0)          11           1
test_concat_axis_1[False-inner-1-objs4] (THIS-PR)          106.3069 (1.11)     107.8606 (1.11)     107.0745 (1.11)     0.5239 (1.0)      107.0633 (1.11)     0.8757 (1.0)           3;0   9.3393 (0.90)         10           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs5]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs5] (branch-21.12)     276.2022 (1.0)      278.3065 (1.0)      277.3080 (1.0)      0.9845 (1.0)      277.5305 (1.0)      1.8682 (1.03)          2;0  3.6061 (1.0)           5           1
test_concat_axis_1[False-inner-1-objs5] (THIS-PR)          304.1699 (1.10)     307.0704 (1.10)     305.4101 (1.10)     1.1629 (1.18)     305.2463 (1.10)     1.8148 (1.0)           2;0  3.2743 (0.91)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------ benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs6]': 2 tests ------------------------------------------------------------------------------
Name (time in us)                                                 Min                   Max                  Mean             StdDev                Median                IQR            Outliers         OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs6] (THIS-PR)            554.7500 (1.0)        669.7820 (1.0)        566.2571 (1.0)      13.3221 (1.0)        561.7749 (1.0)       5.2570 (1.0)         85;94  1,765.9823 (1.0)         748           1
test_concat_axis_1[False-inner-1-objs6] (branch-21.12)     3,956.2921 (7.13)     4,395.6251 (6.56)     4,015.7610 (7.09)     66.9272 (5.02)     3,993.7040 (7.11)     76.8616 (14.62)        28;8    249.0188 (0.14)        241           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-inner-1-objs7]': 2 tests ----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-inner-1-objs7] (THIS-PR)          72.6492 (1.0)       74.1472 (1.0)      73.3672 (1.0)      0.4783 (1.0)      73.4728 (1.0)      0.7316 (1.0)           5;0  13.6301 (1.0)          14           1
test_concat_axis_1[False-inner-1-objs7] (branch-21.12)     98.6850 (1.36)     100.1399 (1.35)     99.5267 (1.36)     0.6551 (1.37)     99.9600 (1.36)     1.1940 (1.63)          4;0  10.0476 (0.74)         10           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs0]': 2 tests ---------------------------------------------------------------------------
Name (time in us)                                               Min                 Max                Mean            StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs0] (branch-21.12)     213.2710 (1.0)      275.2030 (1.0)      223.5803 (1.01)     7.3814 (1.15)     222.9400 (1.02)     12.9229 (5.86)       719;17        4.4727 (0.99)       2875           1
test_concat_axis_1[False-outer-1-objs0] (THIS-PR)          214.6652 (1.01)     290.9640 (1.06)     220.4459 (1.0)      6.4177 (1.0)      218.0159 (1.0)       2.2046 (1.0)       419;512        4.5363 (1.0)        2731           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs1]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs1] (THIS-PR)          140.9027 (1.0)      141.7782 (1.0)      141.4213 (1.0)      0.3324 (1.0)      141.4934 (1.0)      0.5372 (1.0)           4;0  7.0711 (1.0)           8           1
test_concat_axis_1[False-outer-1-objs1] (branch-21.12)     174.4978 (1.24)     175.9156 (1.24)     174.9014 (1.24)     0.5408 (1.63)     174.6700 (1.23)     0.5511 (1.03)          1;0  5.7175 (0.81)          6           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs2]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs2] (THIS-PR)          149.0907 (1.0)      151.3939 (1.0)      149.6573 (1.0)      0.8207 (5.56)     149.2920 (1.0)      0.6782 (3.85)          1;1  6.6819 (1.0)           7           1
test_concat_axis_1[False-outer-1-objs2] (branch-21.12)     183.9202 (1.23)     184.3218 (1.22)     184.0712 (1.23)     0.1477 (1.0)      184.0646 (1.23)     0.1760 (1.0)           2;0  5.4327 (0.81)          6           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs3]': 2 tests --------------------------------------------------------------------
Name (time in ms)                                             Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs3] (THIS-PR)          1.1996 (1.0)      1.6017 (1.0)      1.2270 (1.00)     0.0297 (1.45)     1.2170 (1.0)      0.0374 (3.69)        29;13  815.0022 (1.00)        719           1
test_concat_axis_1[False-outer-1-objs3] (branch-21.12)     1.2096 (1.01)     1.6363 (1.02)     1.2259 (1.0)      0.0205 (1.0)      1.2199 (1.00)     0.0102 (1.0)        88;106  815.7473 (1.0)         762           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs4]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs4] (THIS-PR)          582.8973 (1.0)      586.0131 (1.0)      583.9782 (1.0)      1.2053 (1.0)      583.5081 (1.0)      1.2076 (1.0)           1;0  1.7124 (1.0)           5           1
test_concat_axis_1[False-outer-1-objs4] (branch-21.12)     785.9871 (1.35)     790.6360 (1.35)     787.4976 (1.35)     1.8293 (1.52)     786.8087 (1.35)     1.7791 (1.47)          1;0  1.2698 (0.74)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs5]': 2 tests -------------------------------------------------------------------
Name (time in s)                                              Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs5] (THIS-PR)          1.9260 (1.0)      1.9343 (1.0)      1.9299 (1.0)      0.0031 (1.0)      1.9299 (1.0)      0.0038 (1.0)           2;0  0.5182 (1.0)           5           1
test_concat_axis_1[False-outer-1-objs5] (branch-21.12)     2.1733 (1.13)     2.1830 (1.13)     2.1777 (1.13)     0.0039 (1.26)     2.1784 (1.13)     0.0058 (1.53)          2;0  0.4592 (0.89)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs6]': 2 tests ---------------------------------------------------------------------------
Name (time in us)                                               Min                 Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs6] (THIS-PR)          554.3760 (1.0)      632.9010 (1.02)     575.7529 (1.02)     16.1334 (2.40)     566.0525 (1.00)     31.2359 (7.54)        545;0        1.7369 (0.98)       1442           1
test_concat_axis_1[False-outer-1-objs6] (branch-21.12)     556.5900 (1.00)     622.5759 (1.0)      566.3433 (1.0)       6.7226 (1.0)      564.7328 (1.0)       4.1408 (1.0)        114;89        1.7657 (1.0)        1497           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[False-outer-1-objs7]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                               Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[False-outer-1-objs7] (THIS-PR)          596.3256 (1.0)      600.5619 (1.0)      597.9632 (1.0)      1.6437 (1.0)      597.7408 (1.0)      2.1454 (1.0)           1;0  1.6723 (1.0)           5           1
test_concat_axis_1[False-outer-1-objs7] (branch-21.12)     654.1722 (1.10)     666.8746 (1.11)     657.2377 (1.10)     5.4777 (3.33)     654.3422 (1.09)     4.8897 (2.28)          1;1  1.5215 (0.91)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs0]': 2 tests -------------------------------------------------------------------------------
Name (time in us)                                                Min                   Max                  Mean              StdDev                Median                 IQR            Outliers         OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs0] (THIS-PR)            222.4192 (1.0)        312.4340 (1.0)        233.9587 (1.0)       12.3266 (1.0)        226.9410 (1.0)       17.2716 (1.0)        150;17  4,274.2589 (1.0)         896           1
test_concat_axis_1[True-inner-1-objs0] (branch-21.12)     1,831.1338 (8.23)     5,528.5210 (17.70)    2,174.9929 (9.30)     411.6862 (33.40)    2,110.6380 (9.30)     890.1195 (51.54)        77;1    459.7716 (0.11)        293           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs1]': 2 tests ---------------------------------------------------------------------
Name (time in ms)                                             Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs1] (THIS-PR)          19.3491 (1.0)      23.9291 (1.0)      20.4857 (1.0)      1.4031 (13.72)    19.5300 (1.0)      2.5649 (19.47)        14;0  48.8145 (1.0)          40           1
test_concat_axis_1[True-inner-1-objs1] (branch-21.12)     30.9140 (1.60)     31.3545 (1.31)     31.0313 (1.51)     0.1023 (1.0)      31.0049 (1.59)     0.1318 (1.0)           6;1  32.2255 (0.66)         30           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs2]': 2 tests ---------------------------------------------------------------------
Name (time in ms)                                             Min                Max               Mean            StdDev             Median               IQR            Outliers      OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs2] (THIS-PR)          19.3977 (1.0)      22.6105 (1.0)      19.6793 (1.0)      0.6127 (1.0)      19.5005 (1.0)      0.2517 (1.0)           3;3  50.8148 (1.0)          49           1
test_concat_axis_1[True-inner-1-objs2] (branch-21.12)     31.0002 (1.60)     37.2946 (1.65)     31.4314 (1.60)     1.1519 (1.88)     31.1185 (1.60)     0.2629 (1.04)          2;3  31.8153 (0.63)         32           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs3]': 2 tests ----------------------------------------------------------------------
Name (time in ms)                                             Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs3] (THIS-PR)           1.2086 (1.0)       3.2895 (1.0)       1.2670 (1.0)      0.0809 (1.0)       1.2712 (1.0)      0.0247 (1.0)          6;42  789.2781 (1.0)         685           1
test_concat_axis_1[True-inner-1-objs3] (branch-21.12)     44.0268 (36.43)    45.0905 (13.71)    44.4070 (35.05)    0.2370 (2.93)     44.3967 (34.92)    0.2955 (11.95)         6;1   22.5190 (0.03)         24           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------ benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs4]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs4] (branch-21.12)      94.6051 (1.0)       96.7158 (1.0)       95.3382 (1.0)      0.5723 (1.59)      95.1666 (1.0)      0.5416 (1.0)           3;1  10.4890 (1.0)          11           1
test_concat_axis_1[True-inner-1-objs4] (THIS-PR)          104.9262 (1.11)     105.8423 (1.09)     105.3436 (1.10)     0.3590 (1.0)      105.2455 (1.11)     0.5744 (1.06)          2;0   9.4927 (0.91)          6           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs5]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs5] (branch-21.12)     273.0914 (1.0)      273.9324 (1.0)      273.4240 (1.0)      0.3789 (1.0)      273.1949 (1.0)      0.6226 (1.0)           1;0  3.6573 (1.0)           5           1
test_concat_axis_1[True-inner-1-objs5] (THIS-PR)          298.2814 (1.09)     300.4248 (1.10)     299.5427 (1.10)     0.8678 (2.29)     299.7728 (1.10)     1.3431 (2.16)          2;0  3.3384 (0.91)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs6]': 2 tests -------------------------------------------------------------------------------
Name (time in us)                                                Min                   Max                  Mean              StdDev                Median                 IQR            Outliers         OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs6] (THIS-PR)            560.6860 (1.0)        664.2400 (1.0)        586.7618 (1.0)       17.0820 (1.0)        596.3098 (1.0)       31.9778 (1.0)         605;3  1,704.2692 (1.0)        1399           1
test_concat_axis_1[True-inner-1-objs6] (branch-21.12)     3,963.3820 (7.07)     7,186.5108 (10.82)    4,081.5076 (6.96)     322.1541 (18.86)    4,015.4392 (6.73)     120.8268 (3.78)          5;9    245.0075 (0.14)        229           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------ benchmark 'bench_concat.py::test_concat_axis_1[True-inner-1-objs7]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-inner-1-objs7] (THIS-PR)           72.7404 (1.0)       74.9232 (1.0)       73.5822 (1.0)      0.7077 (4.98)      73.7620 (1.0)      1.0375 (9.03)          5;0  13.5903 (1.0)          13           1
test_concat_axis_1[True-inner-1-objs7] (branch-21.12)     100.0205 (1.38)     100.4437 (1.34)     100.1622 (1.36)     0.1422 (1.0)      100.1149 (1.36)     0.1149 (1.0)           2;2   9.9838 (0.73)         10           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs0]': 2 tests ----------------------------------------------------------------------------
Name (time in us)                                              Min                   Max                Mean             StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs0] (THIS-PR)          227.8399 (1.0)      3,832.8250 (13.42)    250.7014 (1.05)     70.9327 (17.37)    255.2045 (1.07)     23.8101 (10.25)         5;8        3.9888 (0.96)       2684           1
test_concat_axis_1[True-outer-1-objs0] (branch-21.12)     235.2530 (1.03)       285.5239 (1.0)      239.8447 (1.0)       4.0831 (1.0)      238.6939 (1.0)       2.3230 (1.0)       243;256        4.1694 (1.0)        2670           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs1]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs1] (THIS-PR)          141.7261 (1.0)      145.9042 (1.0)      142.7198 (1.0)      1.7906 (18.95)    141.9498 (1.0)      1.3718 (10.05)         1;1  7.0067 (1.0)           5           1
test_concat_axis_1[True-outer-1-objs1] (branch-21.12)     175.0254 (1.23)     175.2591 (1.20)     175.1198 (1.23)     0.0945 (1.0)      175.0752 (1.23)     0.1364 (1.0)           1;0  5.7104 (0.81)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs2]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs2] (THIS-PR)          149.5332 (1.0)      150.3494 (1.0)      149.9293 (1.0)      0.2652 (1.0)      149.8476 (1.0)      0.3105 (1.0)           2;0  6.6698 (1.0)           7           1
test_concat_axis_1[True-outer-1-objs2] (branch-21.12)     183.8074 (1.23)     184.6288 (1.23)     184.2467 (1.23)     0.3398 (1.28)     184.2170 (1.23)     0.5747 (1.85)          3;0  5.4275 (0.81)          6           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs3]': 2 tests --------------------------------------------------------------------
Name (time in ms)                                            Min               Max              Mean            StdDev            Median               IQR            Outliers       OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs3] (THIS-PR)          1.2082 (1.0)      1.9830 (1.44)     1.2325 (1.0)      0.0367 (2.09)     1.2202 (1.0)      0.0377 (1.70)         21;5  811.3756 (1.0)         696           1
test_concat_axis_1[True-outer-1-objs3] (branch-21.12)     1.2231 (1.01)     1.3767 (1.0)      1.2394 (1.01)     0.0176 (1.0)      1.2321 (1.01)     0.0221 (1.0)        160;12  806.8524 (0.99)        727           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs4]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs4] (THIS-PR)          574.2238 (1.0)      576.4085 (1.0)      575.5754 (1.0)      0.8308 (1.0)      575.7421 (1.0)      0.9577 (1.0)           2;0  1.7374 (1.0)           5           1
test_concat_axis_1[True-outer-1-objs4] (branch-21.12)     770.7027 (1.34)     772.6688 (1.34)     771.6322 (1.34)     0.9549 (1.15)     771.0687 (1.34)     1.6949 (1.77)          2;0  1.2960 (0.75)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs5]': 2 tests -------------------------------------------------------------------
Name (time in s)                                             Min               Max              Mean            StdDev            Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs5] (THIS-PR)          1.9025 (1.0)      1.9095 (1.0)      1.9074 (1.0)      0.0028 (1.0)      1.9082 (1.0)      0.0023 (1.0)           1;1  0.5243 (1.0)           5           1
test_concat_axis_1[True-outer-1-objs5] (branch-21.12)     2.1330 (1.12)     2.1428 (1.12)     2.1374 (1.12)     0.0039 (1.42)     2.1375 (1.12)     0.0062 (2.75)          2;0  0.4679 (0.89)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs6]': 2 tests --------------------------------------------------------------------------
Name (time in us)                                              Min                 Max                Mean             StdDev              Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs6] (branch-21.12)     558.6550 (1.0)      641.9669 (1.0)      570.2701 (1.0)      11.1347 (1.0)      566.8140 (1.0)      5.0180 (1.0)       141;153        1.7536 (1.0)        1498           1
test_concat_axis_1[True-outer-1-objs6] (THIS-PR)          563.2618 (1.01)     663.0530 (1.03)     594.9855 (1.04)     15.4747 (1.39)     600.2941 (1.06)     8.7381 (1.74)      399;373        1.6807 (0.96)       1394           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------- benchmark 'bench_concat.py::test_concat_axis_1[True-outer-1-objs7]': 2 tests -----------------------------------------------------------------------
Name (time in ms)                                              Min                 Max                Mean            StdDev              Median               IQR            Outliers     OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_concat_axis_1[True-outer-1-objs7] (THIS-PR)          597.2443 (1.0)      600.4502 (1.0)      598.6500 (1.0)      1.4581 (3.99)     598.3555 (1.0)      2.6978 (4.06)          1;0  1.6704 (1.0)           5           1
test_concat_axis_1[True-outer-1-objs7] (branch-21.12)     653.1495 (1.09)     653.9721 (1.09)     653.5529 (1.09)     0.3653 (1.0)      653.4739 (1.09)     0.6643 (1.0)           2;0  1.5301 (0.92)          5           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Associated benchmarks are being added here: vyasr/cudf_benchmarks#1

.pre-commit-config.yaml

codecov · 2021-09-29T19:57:54Z

Codecov Report

Merging #9333 (45aa26a) into branch-21.12 (ab4bfaa) will decrease coverage by 0.21%.
The diff coverage is 0.00%.

❗ Current head 45aa26a differs from pull request most recent head 47ce5d1. Consider uploading reports for the commit 47ce5d1 to get more accurate results

@@               Coverage Diff                @@
##           branch-21.12    #9333      +/-   ##
================================================
- Coverage         10.79%   10.57%   -0.22%     
================================================
  Files               116      116              
  Lines             18869    19388     +519     
================================================
+ Hits               2036     2051      +15     
- Misses            16833    17337     +504

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_lib/__init__.py	`0.00% <ø> (ø)`
python/cudf/cudf/_lib/strings/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/parquet.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/series.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/reshape.py	`0.00% <0.00%> (ø)`
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 794863c...47ce5d1. Read the comment docs.

vyasr

Getting closer. I have a few suggestions and there's one discussion that we may need to continue a bit offline.

python/cudf/cudf/_lib/column.pyx

python/cudf/cudf/core/column/column.py

python/cudf/cudf/core/index.py

python/cudf/cudf/core/multiindex.py

python/cudf/cudf/core/_base_index.py

python/cudf/cudf/core/multiindex.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

vyasr

A few minor last changes, then I think this is ready to go (pending SWIPAT).

python/cudf/cudf/core/index.py

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

vyasr

LGTM

quasiben · 2021-10-19T14:26:04Z

With @vyasr review I think we should be good. Merging in now

quasiben · 2021-10-19T14:26:12Z

@gpucibot merge

galipremsagar · 2021-10-19T14:28:01Z

Thanks @vyasr for patiently reviewing a lengthy PR like this one.

galipremsagar added 7 commits September 21, 2021 08:01

add tests

826cd6c

multiindex union

50b8850

merge

61b56cc

add number of index apis

910e682

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

d252aae

cleanup

35406f6

cleanup

160093d

github-actions bot added the Python Affects Python cuDF API. label Sep 29, 2021

galipremsagar commented Sep 29, 2021

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

9d5f7df

galipremsagar added 3 commits September 29, 2021 11:00

cover all tests for mulitIndex.union

60daeaf

add MultiIndex.intersections tests

a43de79

add Index.union tests

28c13ff

galipremsagar added 15 commits September 29, 2021 13:22

add index intersection tests

d4c1ebd

remove print

ea32e41

add union docstring

205d947

add intersection docs

e6f0ea5

add docstrings

a943842

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

abecd07

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

fcf2664

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

c702396

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

838a34c

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

aaab3a5

add caching to distinct_count

3adab76

fix union

a56dbfa

reorganize

f7d9a8f

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

5d57597

cleanup

d897e1d

galipremsagar added 4 commits October 11, 2021 11:39

fix res_name

ae25694

add comments

d595dc3

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

0873d39

add todo

92a9a40

galipremsagar added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 11, 2021

vyasr requested changes Oct 11, 2021

View reviewed changes

galipremsagar and others added 7 commits October 12, 2021 08:50

Apply suggestions from code review

49cabf9

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

1a75299

style

5e1c7cc

remove paranthesis

0221386

add todo

897c25a

add more test coverage

21f5a97

refactor if/elif blocks

975566d

galipremsagar added the DO NOT MERGE Hold off on merging; see PR for details label Oct 12, 2021

vyasr requested changes Oct 13, 2021

View reviewed changes

galipremsagar and others added 5 commits October 12, 2021 23:17

Update python/cudf/cudf/core/index.py

048ec0a

Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>

address reviews

b783de0

merge

1365f36

Merge remote-tracking branch 'upstream/branch-21.12' into 9223

46d54fc

add tests for is_* methods

47ce5d1

galipremsagar removed the DO NOT MERGE Hold off on merging; see PR for details label Oct 18, 2021

vyasr approved these changes Oct 18, 2021

View reviewed changes

rapids-bot bot merged commit a19bd23 into rapidsai:branch-21.12 Oct 19, 2021

galipremsagar mentioned this pull request Oct 19, 2021

[BUG] RangeIndex shouldn't be materialized in cudf.concat #9200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Optimizations for `cudf.concat` when `axis=1` #9333

[REVIEW] Optimizations for `cudf.concat` when `axis=1` #9333

galipremsagar commented Sep 29, 2021 •

edited

Loading

codecov bot commented Sep 29, 2021 •

edited

Loading

vyasr left a comment

vyasr left a comment

vyasr left a comment

quasiben commented Oct 19, 2021

quasiben commented Oct 19, 2021

galipremsagar commented Oct 19, 2021

[REVIEW] Optimizations for cudf.concat when axis=1 #9333

[REVIEW] Optimizations for cudf.concat when axis=1 #9333

Conversation

galipremsagar commented Sep 29, 2021 • edited Loading

codecov bot commented Sep 29, 2021 • edited Loading

Codecov Report

vyasr left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

quasiben commented Oct 19, 2021

quasiben commented Oct 19, 2021

galipremsagar commented Oct 19, 2021

[REVIEW] Optimizations for `cudf.concat` when `axis=1` #9333

[REVIEW] Optimizations for `cudf.concat` when `axis=1` #9333

galipremsagar commented Sep 29, 2021 •

edited

Loading

codecov bot commented Sep 29, 2021 •

edited

Loading