Skip to content

added an adaptive batching mechanism to handle large test sets based on estimated memory usage #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Krishnadubey1008
Copy link
Contributor

This PR fixes #125

Decription

added an adaptive batching mechanism to handle large test sets based on estimated memory usage. The default behavior can be overridden by expert users by adjusting the memory_saving_mode parameter.

changes made

  1. Added a method to estimate memory usage.
  2. Modified the predict_proba method to use adaptive batching.

@noahho noahho requested a review from Copilot April 4, 2025 15:49
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment on lines +560 to +562
X[i:i + batch_size]
for output, config in self.executor_.iter_outputs(
X,
Copy link
Preview

Copilot AI Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slice expression 'X[i:i + batch_size]' is not assigned or used, which likely means the intended batch processing is incomplete. Consider assigning the slice to a variable (e.g. batch = X[i:i + batch_size]) and then processing it through the executor.

Suggested change
X[i:i + batch_size]
for output, config in self.executor_.iter_outputs(
X,
batch = X[i:i + batch_size]
for output, config in self.executor_.iter_outputs(
batch,

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

@noahho
Copy link
Collaborator

noahho commented Apr 4, 2025

Hi @Krishnadubey1008,

Thanks for tackling issue #125 and adding the adaptive batching mechanism.

To get this merged, could you please address the following points:

Memory Estimation Function: Please move the memory estimation logic into a reusable function within our utils repository. This helps keep common utilities centralized. You can use this implementation as a reference: https://github.com/PriorLabs/tabpfn_common_utils/blob/524cee72cc6f33cf59fc943dc3e4b5428f3a79bc/expense_estimation.py#L9
CI Checks: The automated tests and the Ruff linter are currently failing. Please investigate the errors shown in the CI checks logs and apply the necessary fixes.
Copilot Suggestion: Please review the comment/suggestion made by GitHub Copilot on this PR, as it may contain relevant points.
Let me know if you have any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch predictions when test set is large
2 participants