Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBDT does not support xgboost inference #15197

Closed
1 task
vepadulano opened this issue Apr 10, 2024 · 1 comment
Closed
1 task

RBDT does not support xgboost inference #15197

vepadulano opened this issue Apr 10, 2024 · 1 comment

Comments

@vepadulano
Copy link
Member

Check duplicate issues.

  • Checked for duplicates

Description

RBDT was never testing proper handling of xgboost models, either in writing them to ROOT format or in their usage afterwards.

The functionality was never actually tested since the xgboost package was not installed on the CI nodes and the only available test was silently not running if the package was not found

# Disabled because RBDT doesn't support the imbalanced tree structure of
# XGBoost models.
# if(dataframe)
# find_python_module(xgboost QUIET)
# if (PY_XGBOOST_FOUND)
# ROOT_ADD_PYUNITTEST(rbdt_xgboost rbdt_xgboost.py)
# endif()
# endif()

The test had to be disabled right after the package was installed on some nodes and it immediately triggered failures as documented in this PR #15183

Since today, all platforms are now running (for the first time) RBDT tests with xgboost models. The Alma9 platform highlights a severe problem on the following three tests

 TEST FAILURES:
296:test-tcollbm
1314:tutorial-tmva-tmva101_Training-py
1315:tutorial-tmva-tmva102_Testing-py

By running inside a debugger, we can see that there is an issue at exit time (which does not depend on the version of xgboost)

(ROOT-CI) [root@663cf5af2db5 tmva]# gdb --args python tmva102_Testing.py 
[...]
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f1251f78110 in __do_global_dtors_aux () from /lib64/libopenblaso.so.0
(gdb) 
(gdb) bt
#0  0x00007f1251f78110 in __do_global_dtors_aux () from /lib64/libopenblaso.so.0
#1  0x00007f1279ebef9e in _dl_fini () at dl-fini.c:142
#2  0x00007f12796574dd in __run_exit_handlers (status=0, listp=0x7f12797fa838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, 
    run_dtors=run_dtors@entry=true) at exit.c:116
#3  0x00007f1279657620 in __GI_exit (status=<optimized out>) at exit.c:146
#4  0x00007f127963feb7 in __libc_start_call_main (main=main@entry=0x557932280160 <main>, argc=argc@entry=2, argv=argv@entry=0x7ffe94d9b838)
    at ../sysdeps/nptl/libc_start_call_main.h:74
#5  0x00007f127963ff60 in __libc_start_main_impl (main=0x557932280160 <main>, argc=2, argv=0x7ffe94d9b838, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7ffe94d9b828) at ../csu/libc-start.c:389
#6  0x0000557932280095 in _start ()

And the same stacktrace is produced for all three tests. This may or may not be linked to wlav/cppyy#227 and the fact that complex Python packages (in this case it's xgboost, but also pytorch is notorious for this behaviour) seem to not handle well C++ symbols.

Our current understanding of the situation with RBDT is the following:

  • The implementation that is currently in master never actually worked on multiple levels (largely untested, tutorials did not check for correctness of results)
  • Even if it worked, it lacked the ability to handle imbalanced trees which is pretty common when using xgboost models
  • The re-implementation of the feature available at [TMVA] Fix RBDT evaluation of XGBoost models by merging RBDT with FastForest #15173 vastly improves the functionality
  • There seems to be a major issue with complex Python packages and C++ which looks like we can't solve and will make at least one platform always red on our CI if we leave things as they are

Reproducer

At least on Alma9, the following is the smallest reproducer of the aforementioned segfault

import xgboost
import ROOT
import numpy as np
import json

np.random.seed(1234)


def create_dataset(num_events, num_features, num_outputs, dtype=np.float32):
    x = np.random.normal(0.0, 1.0, (num_events, num_features)).astype(dtype=dtype)
    if num_outputs == 1:
        y = np.random.normal(0.0, 1.0, (num_events)).astype(dtype=dtype)
    else:
        y = np.random.choice(
            a=range(num_outputs), size=(num_events), p=[1.0 / float(num_outputs)] * num_outputs
        ).astype(dtype=dtype)
    return x, y


def _test_XGBBinary(label):
    """
    Compare response of XGB classifier and TMVA tree inference system.
    """
    x, y = create_dataset(1000, 10, 2)
    xgb = xgboost.XGBClassifier(n_estimators=100, max_depth=3)
    xgb.fit(x, y)
    ROOT.TMVA.Experimental.SaveXGBoost(xgb, "myModel", "testXGBBinary{}.root".format(label), num_inputs=10)
    bdt = ROOT.TMVA.Experimental.RBDT("myModel", "testXGBBinary{}.root".format(label))

    y_xgb = xgb.predict_proba(x)[:, 1].squeeze()
    y_bdt = bdt.Compute(x).squeeze()
    np.testing.assert_array_almost_equal(y_xgb, y_bdt)

if __name__ == "__main__":
    _test_XGBBinary("default")

ROOT version

Any

Installation method

build from source

Operating system

Alma9 (others could be affected)

Additional context

No response

vepadulano added a commit to vepadulano/root that referenced this issue Apr 10, 2024
Representing the current situation at root-project#15197
vepadulano added a commit to vepadulano/root that referenced this issue Apr 10, 2024
Representing the current situation at root-project#15197
vepadulano added a commit to vepadulano/root that referenced this issue Apr 10, 2024
Representing the current situation at root-project#15197
vepadulano added a commit that referenced this issue Apr 10, 2024
Representing the current situation at #15197
vepadulano added a commit to vepadulano/root that referenced this issue Apr 10, 2024
Representing the current situation at root-project#15197
vepadulano added a commit that referenced this issue Apr 10, 2024
Representing the current situation at #15197
vepadulano added a commit to vepadulano/root that referenced this issue Apr 11, 2024
Representing the current situation at root-project#15197
vepadulano added a commit to vepadulano/root that referenced this issue Apr 11, 2024
Representing the current situation at root-project#15197
This was referenced Apr 11, 2024
vepadulano added a commit that referenced this issue Apr 11, 2024
Representing the current situation at #15197
vepadulano added a commit that referenced this issue Apr 11, 2024
Representing the current situation at #15197
@guitargeek
Copy link
Contributor

Closed by: #15173.

kristupaspranc pushed a commit to kristupaspranc/root that referenced this issue May 21, 2024
Representing the current situation at root-project#15197
silverweed pushed a commit to silverweed/root that referenced this issue Aug 19, 2024
Representing the current situation at root-project#15197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants