Skip to content

Fix #481 #482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 1, 2025
Merged

Fix #481 #482

merged 11 commits into from
Jun 1, 2025

Conversation

elseml
Copy link
Member

@elseml elseml commented May 20, 2025

Added the option for OfflineDataset to prevent shuffling on epoch end, leading to stable validation losses also when validation_set_size != batch_size (see #481). Not sure if we want explicit tests for this since it would involve quite some code?

@stefanradev93
Copy link
Contributor

The change should be covered by the workflow tests.

Copy link

codecov bot commented May 20, 2025

Codecov Report

Attention: Patch coverage is 50.00000% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
bayesflow/datasets/disk_dataset.py 0.00% 5 Missing ⚠️
Files with missing lines Coverage Δ
bayesflow/datasets/offline_dataset.py 83.33% <100.00%> (+0.98%) ⬆️
bayesflow/datasets/disk_dataset.py 31.11% <0.00%> (-2.23%) ⬇️

@vpratz
Copy link
Collaborator

vpratz commented May 21, 2025

The changes look good, but per @stefanradev93 's comment here, I would also be in favor to introduce a shuffle parameter and use it here, as this would be good to have anyway and more transparent to the user. @elseml Would you implement this change, or should I propose something?

@elseml
Copy link
Member Author

elseml commented May 23, 2025

Sure! I liked that the previous solution reused the existing stage argument, but this way it is more explicit. I adapted the code using shuffle_on_epoch_end instead since shuffling is also applied on dataset creation, which does not relate to the current issue (we could also disable both via a general shuffle argument, but I lean towards retaining initial shuffling also for validation sets).

While this fixes validation set shuffling for the workflow, users working with a manual training pipeline now always need to remember to set shuffle_on_epoch_end=False when creating OfflineDataset validation sets to prevent shuffling during training. We could also delete shuffling by default altogether and let users always specify the argument, but this would not be too elegant. Any ideas for a more principled solution that alleviates this requirement (also pinging @LarsKue)?

(The fixes to validation set shuffling should be complementary to #485)

@vpratz
Copy link
Collaborator

vpratz commented May 23, 2025

Why do you lean towards retaining initial shuffling also for validation sets? My impression would be that we usually aggregate over the validation batches anyway, and computing them should not modify anything, so the order should not matter. Do you have cases in mind where the order of the validation set would matter? If not, I think distinguishing shuffle vs. shuffle_on_epoch_end is not necessary for now, and going with shuffle is sufficient. What do you think?

@stefanradev93
Copy link
Contributor

stefanradev93 commented May 23, 2025

Shuffle may still change the results slightly if the batch size does not evenly divide the number of instances. In this case, the last batch is (unfortunately) skipped.

@vpratz
Copy link
Collaborator

vpratz commented May 23, 2025

Are you sure? I think I have seen incomplete batches in the past. At least the datasets produce incomplete batches:

ds = bf.OfflineDataset(data={"data": np.random.normal(size=(66,1))}, batch_size=32, adapter=bf.Adapter())

for i, d in enumerate(ds):
    print(i, d['data'].shape)
# Output:
0 (32, 1)
1 (32, 1)
2 (2, 1)

Is there a skipping mechanism further downstream?

@stefanradev93
Copy link
Contributor

Yes, I believe the internal evaluate function has drop_last or something of sorts. @LarsKue

@elseml
Copy link
Member Author

elseml commented May 23, 2025

I had the same impression, but might be mislead here. Curious to hear Lars' opinion on this.

@LarsKue
Copy link
Contributor

LarsKue commented May 23, 2025

It appears to, but I have been unable to find it in the code. Could be something else going on, like an off-by-one error.

@elseml
Copy link
Member Author

elseml commented May 27, 2025

Since we could not find evidence that initial shuffling is required for validation/test sets, I followed Valentins suggestion to simplify the interface and control both initial shuffling and shuffling on epoch end with a single argument (using a shuffle_dataset argument since there is already a class method shuffle).

I also added equivalent changes to the DiskDataset class and disabled validation set shuffling in the two (experimental) example notebooks that currently manually create OfflineDataset validation sets so that we don't forget about it. Everything should be ready for merge now from my side.

Edit: I'd be interested to hear your opinions on whether we should make shuffle_dataset a required argument to avoid it being overlooked during validation set creation.

@vpratz
Copy link
Collaborator

vpratz commented May 28, 2025

@elseml Could you please check if the track-losses branch already resolves the problem for your example? My intuition is that it should, but I'm not sure...

@vpratz
Copy link
Collaborator

vpratz commented May 29, 2025

Edit: Nevermind, looks like I didn't correctly adapt the notebook to test it.

@vpratz
Copy link
Collaborator

vpratz commented May 29, 2025

In addition, I would be in favor of renaming the shuffle_dataset parameter to shuffle, to be consistent with e.g. torch Dataloaders. We can store it as _shuffle to avoid a collision with the method.

@vpratz
Copy link
Collaborator

vpratz commented May 29, 2025

@elseml I found the underlying reason for the order-dependence (taking a mean of means with different N, without proper weighting). With this fixed in #485, shuffling in the validation dataset is not a problem anymore, and no modifications to the notebooks are necessary.
What do you think, do we want to introduce a shuffle parameter anyways, in case users have special use cases where they require it? Or do we wait until someone actually asks for it?

@vpratz vpratz merged commit 677bacb into bayesflow-org:dev Jun 1, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants