Skip to content

Conversation

cbourjau
Copy link
Contributor

@cbourjau cbourjau commented Jun 12, 2024

The C-API exposes LGBM_BoosterRefit which receives the predicted leaf indices as a flat buffer that is laid out in the shape of nrow x ncol. The Boosting::RefitTree function, on the other hand, expects these indices as a nested vector object (std::vector<std::vector<int>>). Creating this nested object requires various additional allocations and amounts to an entire copy of the initial buffer doubling the memory requirements.

This PR changes the API of Boosting::RefitTree to take a pointer to a flat buffer of int32s, just like the C-API, thus avoiding the copy. This assumes that this part of the API is not regarded as stable. The C-API remains unchanged.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in LightGBM. It'll be at least a few days until someone's able to review this, as we're currently focused on finishing a release (#6439 (comment)).

@jameslamb
Copy link
Collaborator

[trigger ci]

Every CI run on this PR will require a maintainer manually approving it, because you've never contributed here before. Sorry for the inconvenience, but GitHub introduced this as a security measure a few years ago and we've decided to leave it enabled. We do occasionally receive malicious pull requests trying to use our CI resources 🙃

@cbourjau
Copy link
Contributor Author

Thanks for the feedback! Some CI does appear to run, though, and some of it exhibited IO failures the first time around which I was trying to address 🤷 .

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looking forward to this performance improvement! I left a few comments, esp. regarding the changes to the application code

@cbourjau cbourjau requested a review from borchero June 17, 2024 15:33
Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, I think we're getting there! 😁

@borchero
Copy link
Collaborator

Thanks @cbourjau! AppVeyor CI will be fixed by #6490, that only leaves the linting job 😁

Copy link
Collaborator

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks a lot for your contribution @cbourjau 🚀

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I'll dismiss my prior blocking review so it doesn't block merging. But I would like @shiyu1994 or @guolinke to also approve before we merge this. They might remember a reason that this copy was used in #1124.

@jameslamb jameslamb dismissed their stale review July 9, 2024 02:33

@borchero reviewed and all my concerns were addressed

ncol = Common::StringToArray<int>(result_reader.Lines()[0], '\t').size();
}
std::vector<int> pred_leaf;
pred_leaf.reserve(nrow * ncol);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use resize instead of reserve?

Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. This should be very helpful.

@jameslamb jameslamb merged commit 1886bf5 into microsoft:master Jul 10, 2024
@cbourjau cbourjau deleted the avoid-copy-refit branch July 11, 2024 10:09
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants