Skip to content

Conversation

@shenkha
Copy link
Contributor

@shenkha shenkha commented Jul 14, 2025

What does this PR do?

This pull request introduces a major enhancement to the linear tree-based models by implementing Ensemble Tree Models, which improves prediction accuracy and robustness over a single tree model.

Key Changes:

1. Ensemble of Trees

  • A new EnsembleTreeModel class in libmultilabel/linear/tree.py now manages multiple tree models.
  • The train_ensemble_tree function handles the training of n separate tree models, each with a different random seed for diversity.
  • The ensemble's final predictions are an average of the scores from each tree, providing a more stable and accurate result.
  • This functionality is exposed via a new CLI argument --tree_ensemble_models in main.py and integrated into linear_trainer.py.

Example usage:

python main.py --training_file data/eurlex_raw_texts_train.txt \
                --test_file data/eurlex_raw_texts_test.txt \
                --linear \
                --linear_technique tree \
                --tree_ensemble_models 3

Test CLI & API (bash tests/autotest.sh)

Test APIs used by main.py.

  • Test Pass
    • (Copy and paste the last outputted line here.)
  • Not Applicable (i.e., the PR does not include API changes.)

Check API Document

If any new APIs are added, please check if the description of the APIs is added to API document.

  • API document is updated (linear, nn)
  • Not Applicable (i.e., the PR does not include API changes.)

Test quickstart & API (bash tests/docs/test_changed_document.sh)

If any APIs in quickstarts or tutorials are modified, please run this test to check if the current examples can run correctly after the modified APIs are released.

@shenkha shenkha requested review from a team and cjlin1 as code owners July 14, 2025 14:45
@Eleven1Liu Eleven1Liu self-requested a review July 17, 2025 02:13
Copy link
Contributor

@Eleven1Liu Eleven1Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the formatting issues mentioned above, please use black formatter.

@shenkha shenkha changed the title feat(linear): Add ensemble tree model and solver-aware scoring feat(linear): Add ensemble tree model Aug 1, 2025
@Eleven1Liu Eleven1Liu merged commit 213f612 into ntumlgroup:master Aug 13, 2025
1 check passed
@Eleven1Liu Eleven1Liu added the documentation Improvements or additions to documentation label Aug 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation model/linear

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants