-
Notifications
You must be signed in to change notification settings - Fork 546
[Prep-refactor 5] Refactor SequentialEncoder #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a great refactoring that significantly improves the encoder pipeline's architecture. The introduction of TorchPreprocessingPipeline and TorchPreprocessingStep with a clear _fit/_transform pattern and dictionary-based state management makes the code more robust, readable, and extensible. The decoupling of projections into separate embedder modules is also a good design choice. The tests have been diligently updated to reflect these changes. I've found a couple of areas for improvement, including a potential regression.
src/tabpfn/architectures/encoders/steps/normalize_feature_groups_encoder_step.py
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/feature_group_padding_encoder_step.py
Outdated
Show resolved
Hide resolved
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the SequentialEncoder to improve code organization by decoupling projections from preprocessing steps and making the preprocessing pipeline more explicit through dictionary-based state passing instead of tensor arguments.
Changes:
- Renamed
SequentialEncodertoTorchPreprocessingPipelineandSeqEncSteptoTorchPreprocessingStepto clarify that these components handle non-learnable preprocessing only - Introduced new embedder modules (
LinearFeatureGroupEmbedder,MLPFeatureGroupEmbedder) to separate learnable projections from preprocessing - Updated all encoder steps to use
dict[str, torch.Tensor]state dictionaries instead of tensor tuples for clearer data flow - Removed the
seedparameter fromInputNormalizationEncoderStepand related code inupdate_encoder_params
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tabpfn/architectures/encoders/pipeline_interfaces.py | Refactored to use TorchPreprocessingPipeline and TorchPreprocessingStep with dict-based state passing |
| src/tabpfn/architectures/encoders/embedders.py | New file with LinearFeatureGroupEmbedder and MLPFeatureGroupEmbedder for learnable projections |
| src/tabpfn/architectures/encoders/steps/*.py | Updated all encoder steps to use dict-based state dictionaries and standardized in_keys/out_keys handling |
| src/tabpfn/architectures/encoders/init.py | Updated exports to reflect renamed classes and new embedders |
| src/tabpfn/architectures/base/transformer.py | Updated to use TorchPreprocessingPipeline with list-based step initialization |
| src/tabpfn/architectures/base/init.py | Updated encoder factory functions to use new class names and list-based initialization |
| src/tabpfn/utils.py | Removed seed parameter from update_encoder_params |
| src/tabpfn/base.py | Updated call to update_encoder_params to remove seed argument |
| tests/test_model_move_backwards_compatibility.py | Updated to check for TorchPreprocessingPipeline instead of InputEncoder |
| tests/test_model/test_seperate_train_inference.py | Updated to use new pipeline initialization syntax |
| tests/test_encoders/test_projections.py | Updated tests to use TorchPreprocessingPipeline and removed test_interface (moved to test_encoders.py) |
| tests/test_encoders/test_encoders.py | Added test_interface and test_feature_group_padding_and_reshape_step |
| tests/test_encoders/test_embedders.py | New test file for embedder modules |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/tabpfn/architectures/encoders/steps/feature_group_padding_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/variable_num_features_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/remove_empty_features_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/input_normalization_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/nan_handling_encoder_step.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
LeoGrin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks a lot!! The new names are much clearer :)
src/tabpfn/architectures/encoders/steps/categorical_input_encoder_per_feature_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/feature_group_padding_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/normalize_feature_groups_encoder_step.py
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/variable_num_features_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/variable_num_features_encoder_step.py
Outdated
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/feature_group_projections_encoder_step.py
Show resolved
Hide resolved
src/tabpfn/architectures/encoders/steps/nan_handling_encoder_step.py
Outdated
Show resolved
Hide resolved
|
Thanks @alanprior and @LeoGrin ! |
+1 on this. Looks much much clearer and better. I'm learning a lot :) |
No functional changes are done in this PR. It refactors the
SequentialEncoderfor further changes.In particular:
More work is needed to disentangle the preprocessing "state" from the model after this PR is merged.