Conversation
Code Coverage SummaryResults for commit: 7f6316d Minimum allowed coverage is ♻️ This comment has been updated with latest results |
bf91608 to
87b8b82
Compare
GiovanniCanali
left a comment
There was a problem hiding this comment.
Hi @dario-coscia, thank you for the nice PR.
I have left some comments that should be addressed before we can merge.
I also have a few doubts regarding the logic of this callback:
- Scope of normalization: normalization should be applicable only to
InputTargetConditionconditions. Indeed, changing the scale in physics-related problems can be a major source of error and often leads to inconsistent results. In these cases, it’s usually better if the user handles scaling explicitly for more transparent management of the physics. - Graph support:
InputTargetConditionconditions can also work withGraphobjects. This scenario doesn’t seem to be covered in theDataNormalizationCallbackimplementation or in the related tests. Could you please ensure that this is handled? - Tensor-based scale and shift: I am not entirely sure I understand the case where
scaleorshiftare tensors. Could you clarify the intended usage here?
Overall, the current approach feels a bit too complex for managing a relatively simple normalization. Personally, I think it might be cleaner and safer to let the user normalize the data before passing it to the PINA problem. What are your thoughts on this?
Hi, I thinks we should keep normalization inside the training cycle. I have a doubt regarding how the normalization strategy is applied. In particular how can I perform a normalization of the type note that this should be applied on all the three dataset and when I define the callback I do not have access to the dataset splits. |
I share the same concern as @FilippoOlivo. Note that to apply the standard normalization, the mean and the standard deviation should be computed outside the Pina pipeline. This way, applying normalization externally would be more straightforward and natural than doing it through a |
|
I agree with you both. If before training we could access the data, normalizing would be super easy. Unfortunately we can not.. as dataset are created when starting training (do you agree?). Idk what is the best way to do normalization then |
I realize this somewhat defeats the purpose of the PR, but it might be simpler to let users handle normalization themselves. In most cases, this shouldn’t be a difficult task, and trying to cover all possible scenarios and data structures here feels like overkill. What do you think @dario-coscia @FilippoOlivo? |
|
I guess dataset are created during setup. In my opinion what we can do either accept scale and shift parameters by the user or compute them before training starts inside setup |
Which scale and shift? Mean and var? Sometimes you want mean and mad. I don't think is super straightforward... Maybe better to let the user pass a scaling function, which given a tensor returns scale and shift, or something similar |
93a5205 to
b5b0680
Compare
|
sure! in this stage I suggest to give the user only the possibility to provide a function instead of a set of parameters (scale and shift) |
|
@FilippoOlivo I put you as an assignee as well. I agree with you to let the user directly put a function. Are you on it? Let me know how much time it will take, since I would like to merge #636 as soon as possible |
122472c to
12111c6
Compare
pina/callback/normalizer.py
Outdated
| ) | ||
| return stage | ||
|
|
||
| def setup(self, trainer, pl_module, stage): |
There was a problem hiding this comment.
i prefer solver instead of pl_module
pina/callback/normalizer.py
Outdated
| } | ||
|
|
||
| @staticmethod | ||
| def _norm_fn(value, scale, shift): |
There was a problem hiding this comment.
make it public and property
There was a problem hiding this comment.
it is a bit difficult since norm_fn must at least have the dataset you want to normalize
There was a problem hiding this comment.
from what I see _norm_fn only needs value, scale and shift. By making it public and property we are putting the code in a way that in a future the user could define norm_fn
pina/callback/normalizer.py
Outdated
| scaled_value = LabelTensor(scaled_value, value.labels) | ||
| return scaled_value | ||
|
|
||
| def _scale_data(self, dataset): |
There was a problem hiding this comment.
rename normalize_dataset
|
@FilippoOlivo I put some comments, but it looks very good! |
* change name files normalizer data callback
|
Hi @FilippoOlivo ! I updated |
Hi @dario-coscia, I reduced the number of tests. Now it takes around 3s to run |
| target_2 = torch.rand(20, 1) * 5 | ||
|
|
||
|
|
||
| class LabelTensorProblem(AbstractProblem): |
There was a problem hiding this comment.
Isn't it easier to use SupervisedProblem and add a condition? The same applies to TensorProblem
|
Hi @GiovanniCanali thank you for the review. The callback normalises only InputTargetConditions. Other types of conditions are automatically discarded |
Sorry, I missed it. |
7f49e59 to
787fa0a
Compare
787fa0a to
d6bbd93
Compare
|
I think we can merge @dario-coscia @FilippoOlivo! |
|
I added a test to check that the normalizer raises a NotImplementedError when graphs are present! |
|
Great job all! |
* add normalizer callback * implement shift and scale parameters computation * change name files normalizer data callback * reduce tests * fix documentation * add NotImplementedError for PinaGraphDataset --------- Co-authored-by: FilippoOlivo <filippo@filippoolivo.com> Co-authored-by: giovanni <giovanni.canali98@yahoo.it>
* add normalizer callback * implement shift and scale parameters computation * change name files normalizer data callback * reduce tests * fix documentation * add NotImplementedError for PinaGraphDataset --------- Co-authored-by: FilippoOlivo <filippo@filippoolivo.com> Co-authored-by: giovanni <giovanni.canali98@yahoo.it>
* add normalizer callback * implement shift and scale parameters computation * change name files normalizer data callback * reduce tests * fix documentation * add NotImplementedError for PinaGraphDataset --------- Co-authored-by: FilippoOlivo <filippo@filippoolivo.com> Co-authored-by: giovanni <giovanni.canali98@yahoo.it>
Description
This PR fixes #585.
Summary
This PR introduces a new Lightning
Callback–NormalizerDataCallback– that normalizes dataset inputs or targets during training, validation, or testing. The transformation applied is:Usage
Users can provide either a single normalizer for all conditions, or condition-specific normalizers:
Checklist