Skip to content

Conversation

@safaricd
Copy link
Contributor

Issue

Previously, we had no knowledge of the different fit_mode params that were used, meaning it was difficult to decide on the exact focus area for improving any of the fit modes.

Public API Changes

  • No Public API changes

How Has This Been Tested?

Unit tests and manual testing.

Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • A entry has been added to CHANGELOG.md (if relevant for users).
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

@safaricd safaricd requested a review from oscarkey November 25, 2025 17:38
@safaricd safaricd requested a review from a team as a code owner November 25, 2025 17:38
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds telemetry to track the usage of different fit_mode options in TabPFNClassifier and TabPFNRegressor. This is achieved by bumping the tabpfn-common-utils dependency and using the new set_init_params function.

The implementation is straightforward and correct. I have one suggestion to refactor the duplicated telemetry initialization logic into a helper function to improve maintainability.

Additionally, the documentation in TELEMETRY.md should be updated to reflect that fit_mode is now being collected as part of the anonymous usage data. This is important for transparency with users.

Comment on lines 480 to +483
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This telemetry initialization logic, including the call to set_init_params, is also present in TabPFNRegressor.__init__. To improve maintainability and reduce code duplication, consider creating a new helper function in src/tabpfn/base.py that encapsulates this logic.

For example, you could create a function in base.py:

from tabpfn_common_utils.telemetry import set_init_params

def initialize_telemetry_with_params(**params: Any) -> None:
    """Initializes telemetry and sets additional anonymous parameters."""
    initialize_telemetry()
    if params:
        set_init_params(params)

Then you could replace these lines in both TabPFNClassifier and TabPFNRegressor with:

        initialize_telemetry_with_params(fit_mode=self.fit_mode)

This would centralize the telemetry setup and make it easier to add more parameters in the future.

initialize_telemetry()

# Only anonymously record `fit_mode` usage
set_init_params({"fit_mode": self.fit_mode})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the same thing as for model_path and validate that it's a known fit mode? To avoid accidentally collecting PII.
We could define FitMode = Literal["low_memory","fit_preprocessors","fit_with_cache","batched"] in inference.py, import it here and in the regressor interface, and then use typing.get_args() to check the provided one is valid?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants