TabPFNRegressor preprocessing fails on bigger datasets fix #255

Krishnadubey1008 · 2025-03-26T14:33:32Z

This PR fixes #169
chnages made- took n_quantiles=min(n_quantiles, 10_000) in TabPFN/src/tabpfn/model/preprocessing.py

Copilot

Pull Request Overview

This PR fixes an issue in the TabPFNRegressor preprocessing module where the number of quantiles could become excessively large on bigger datasets. The change ensures that the quantiles for various QuantileTransformer instances are capped at 10,000.

Updated the "quantile_uni_coarse" and "quantile_norm_coarse" transformers to use n_quantiles as min(max(num_examples // 10, 2), 10_000).
Updated the "quantile_norm" transformer to use n_quantiles as min(max(num_examples // 5, 2), 10_000).
Updated the "quantile_uni_fine" and "quantile_norm_fine" transformers to use n_quantiles as min(num_examples, 10_000).

noahho · 2025-03-26T14:39:35Z

Thanks so much for this change! Would yu be able to add a test for this change, i.e. one that tests if the preprocessing runs on datasets of > 10,000 samples. We can't run the inference step unfortunately as ofc, it would be way too slow. Only way to test the inference on lareg datasets was if we provided a tiny tabpfn checkpoint, a very small model that is randomly initialized but that would be a project in itself.

Krishnadubey1008 · 2025-03-26T15:01:15Z

@noahho I had added test_preprocessing.py please suggest changes if any

Copilot

Pull Request Overview

This PR fixes an issue with the TabPFNRegressor preprocessing on larger datasets by capping the number of quantiles to 10,000 in several QuantileTransformer configurations.

Updated quantile transformer settings in the preprocessing module to avoid excessive quantile calculations.
Added a test case in tests/test_preprocessing.py to verify functionality on large datasets.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
tests/test_preprocessing.py	Adds a new test to verify preprocessing functionality on large datasets.
src/tabpfn/model/preprocessing.py	Updates the n_quantiles parameter in various QuantileTransformer configurations by capping values to 10,000.

Comments suppressed due to low confidence (2)

tests/test_preprocessing.py:27

Consider adding assertions to verify the shape and properties of the transformed output to increase test coverage beyond a simple non-null check.

assert result is not None

src/tabpfn/model/preprocessing.py:722

[nitpick] The 'quantile_uni_coarse' transformer now caps n_quantiles to 10,000, yet the 'quantile_uni' transformer remains uncapped. If this discrepancy is unintentional, consider applying the same cap or adding a clarifying comment.

n_quantiles=min(max(num_examples // 10, 2), 10_000),

noahho · 2025-03-26T15:12:02Z

Great, this looks really good. There seems to be a tiny ruff issue at this point. Do you know how to resolve?
"ruff check . --fix" with ruff version 0.8.6

noahho · 2025-03-26T15:12:49Z

Ohh also something that copilot just caught:
The 'quantile_uni_coarse' transformer now caps n_quantiles to 10,000, yet the 'quantile_uni' transformer remains uncapped.

Krishnadubey1008 · 2025-03-26T15:16:10Z

i will fix it now

Krishnadubey1008 · 2025-03-26T15:34:47Z

@noahho I had ran ruff check . --fix but still ruff linting test is failing

noahho · 2025-03-26T16:06:05Z

The two open ones don't seem to be automatically fixable:
src/tabpfn/regressor.py:723:89: E501 Line too long (89 > 88)
tests/test_preprocessing.py:12:9: NPY002 Replace legacy np.random.rand call with np.random.Generator

An LLM will know how to fix number 2 and by deleting a character in src/tabpfn/regressor.py:723:89 you fix no1

…dubey1008/TabPFN into Tabpfnregressor_fail_fix

Krishnadubey1008 · 2025-03-27T04:30:53Z

@noahho Please review

noahho · 2025-03-27T09:10:06Z

Thanks a lot for continuing to work on this. It seems there were a few changes made for the linting that weren't right (such as adding ""). I'll look into the PR and fix those things, if you'd like me to.

Krishnadubey1008 · 2025-03-27T09:18:56Z

Yes ,sure

fixed n_quantiles

a3cf6a8

noahho requested a review from Copilot March 26, 2025 14:37

Copilot AI reviewed Mar 26, 2025

View reviewed changes

added test_preprocessing.py

1ea350d

noahho requested a review from Copilot March 26, 2025 15:10

Copilot AI reviewed Mar 26, 2025

View reviewed changes

Krishnadubey1008 and others added 2 commits March 26, 2025 15:20

issues fixed

27dc9bb

Merge branch 'main' into Tabpfnregressor_fail_fix

d462ab7

Krishnadubey1008 added 4 commits March 26, 2025 18:05

Fix linting errors: shorten line and replace legacy np.random.rand call

c2cc2df

Merge branch 'Tabpfnregressor_fail_fix' of https://github.com/Krishna…

6e74960

…dubey1008/TabPFN into Tabpfnregressor_fail_fix

changed median_out to median_ot

1b187dc

--fix done

38d6a60

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TabPFNRegressor preprocessing fails on bigger datasets fix #255

TabPFNRegressor preprocessing fails on bigger datasets fix #255

Krishnadubey1008 commented Mar 26, 2025 •

edited

Loading

Copilot AI left a comment

noahho commented Mar 26, 2025

Krishnadubey1008 commented Mar 26, 2025

Copilot AI left a comment

noahho commented Mar 26, 2025

noahho commented Mar 26, 2025

Krishnadubey1008 commented Mar 26, 2025

Krishnadubey1008 commented Mar 26, 2025 •

edited

Loading

noahho commented Mar 26, 2025

Krishnadubey1008 commented Mar 27, 2025

noahho commented Mar 27, 2025

Krishnadubey1008 commented Mar 27, 2025

TabPFNRegressor preprocessing fails on bigger datasets fix #255

Are you sure you want to change the base?

TabPFNRegressor preprocessing fails on bigger datasets fix #255

Conversation

Krishnadubey1008 commented Mar 26, 2025 • edited Loading

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

noahho commented Mar 26, 2025

Krishnadubey1008 commented Mar 26, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

noahho commented Mar 26, 2025

noahho commented Mar 26, 2025

Krishnadubey1008 commented Mar 26, 2025

Krishnadubey1008 commented Mar 26, 2025 • edited Loading

noahho commented Mar 26, 2025

Krishnadubey1008 commented Mar 27, 2025

noahho commented Mar 27, 2025

Krishnadubey1008 commented Mar 27, 2025

Krishnadubey1008 commented Mar 26, 2025 •

edited

Loading

Krishnadubey1008 commented Mar 26, 2025 •

edited

Loading