Fix #481 #482

elseml · 2025-05-20T11:51:30Z

Added the option for OfflineDataset to prevent shuffling on epoch end, leading to stable validation losses also when validation_set_size != batch_size (see #481). Not sure if we want explicit tests for this since it would involve quite some code?

v2.0.3

stefanradev93 · 2025-05-20T11:58:23Z

The change should be covered by the workflow tests.

codecov · 2025-05-20T12:03:54Z

Codecov Report

Attention: Patch coverage is 50.00000% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
bayesflow/datasets/disk_dataset.py	0.00%	5 Missing ⚠️

Files with missing lines	Coverage Δ
bayesflow/datasets/offline_dataset.py	`83.33% <100.00%> (+0.98%)`	⬆️
bayesflow/datasets/disk_dataset.py	`31.11% <0.00%> (-2.23%)`	⬇️

vpratz · 2025-05-21T13:20:44Z

The changes look good, but per @stefanradev93 's comment here, I would also be in favor to introduce a shuffle parameter and use it here, as this would be good to have anyway and more transparent to the user. @elseml Would you implement this change, or should I propose something?

elseml · 2025-05-23T09:55:38Z

Sure! I liked that the previous solution reused the existing stage argument, but this way it is more explicit. I adapted the code using shuffle_on_epoch_end instead since shuffling is also applied on dataset creation, which does not relate to the current issue (we could also disable both via a general shuffle argument, but I lean towards retaining initial shuffling also for validation sets).

While this fixes validation set shuffling for the workflow, users working with a manual training pipeline now always need to remember to set shuffle_on_epoch_end=False when creating OfflineDataset validation sets to prevent shuffling during training. We could also delete shuffling by default altogether and let users always specify the argument, but this would not be too elegant. Any ideas for a more principled solution that alleviates this requirement (also pinging @LarsKue)?

(The fixes to validation set shuffling should be complementary to #485)

vpratz · 2025-05-23T12:44:37Z

Why do you lean towards retaining initial shuffling also for validation sets? My impression would be that we usually aggregate over the validation batches anyway, and computing them should not modify anything, so the order should not matter. Do you have cases in mind where the order of the validation set would matter? If not, I think distinguishing shuffle vs. shuffle_on_epoch_end is not necessary for now, and going with shuffle is sufficient. What do you think?

stefanradev93 · 2025-05-23T12:48:33Z

Shuffle may still change the results slightly if the batch size does not evenly divide the number of instances. In this case, the last batch is (unfortunately) skipped.

vpratz · 2025-05-23T12:57:35Z

Are you sure? I think I have seen incomplete batches in the past. At least the datasets produce incomplete batches:

ds = bf.OfflineDataset(data={"data": np.random.normal(size=(66,1))}, batch_size=32, adapter=bf.Adapter())

for i, d in enumerate(ds):
    print(i, d['data'].shape)
# Output:
0 (32, 1)
1 (32, 1)
2 (2, 1)

Is there a skipping mechanism further downstream?

stefanradev93 · 2025-05-23T13:01:04Z

Yes, I believe the internal evaluate function has drop_last or something of sorts. @LarsKue

elseml · 2025-05-23T14:05:08Z

I had the same impression, but might be mislead here. Curious to hear Lars' opinion on this.

LarsKue · 2025-05-23T14:15:41Z

It appears to, but I have been unable to find it in the code. Could be something else going on, like an off-by-one error.

elseml · 2025-05-27T15:41:39Z

Since we could not find evidence that initial shuffling is required for validation/test sets, I followed Valentins suggestion to simplify the interface and control both initial shuffling and shuffling on epoch end with a single argument (using a shuffle_dataset argument since there is already a class method shuffle).

I also added equivalent changes to the DiskDataset class and disabled validation set shuffling in the two (experimental) example notebooks that currently manually create OfflineDataset validation sets so that we don't forget about it. Everything should be ready for merge now from my side.

Edit: I'd be interested to hear your opinions on whether we should make shuffle_dataset a required argument to avoid it being overlooked during validation set creation.

vpratz · 2025-05-28T14:04:27Z

@elseml Could you please check if the track-losses branch already resolves the problem for your example? My intuition is that it should, but I'm not sure...

vpratz · 2025-05-29T07:51:41Z

Edit: Nevermind, looks like I didn't correctly adapt the notebook to test it.

vpratz · 2025-05-29T07:53:20Z

In addition, I would be in favor of renaming the shuffle_dataset parameter to shuffle, to be consistent with e.g. torch Dataloaders. We can store it as _shuffle to avoid a collision with the method.

vpratz · 2025-05-29T10:01:29Z

@elseml I found the underlying reason for the order-dependence (taking a mean of means with different N, without proper weighting). With this fixed in #485, shuffling in the validation dataset is not a problem anymore, and no modifications to the notebooks are necessary.
What do you think, do we want to introduce a shuffle parameter anyways, in case users have special use cases where they require it? Or do we wait until someone actually asks for it?

LarsKue and others added 3 commits May 5, 2025 21:03

Merge pull request bayesflow-org#460 from bayesflow-org/dev

afc1af1

v2.0.3

Disable shuffling for validation sets (bayesflow-org#481)

0fa09b6

Fix validation set shuffling (bayesflow-org#481)

2e47608

elseml added 2 commits May 23, 2025 11:23

Merge remote-tracking branch 'upstream/dev' into dev

643543b

Use shuffle_on_epoch_end instead of the stage arg to control shuffling

e5484d2

elseml added 4 commits May 27, 2025 17:03

Merge remote-tracking branch 'upstream' into dev

51b1338

Control all shuffling via a single shuffle_dataset argument

cccf75a

Add shuffle_dataset argument

cfbd6c9

Disable validation data shuffling in example notebooks

6a8e68a

vpratz mentioned this pull request May 29, 2025

Correctly track train / validation losses #485

Merged

vpratz added 2 commits June 1, 2025 17:08

Merge remote-tracking branch 'upstream/dev' into feat-dataset-no-shuffle

a0ae9a1

Limit changes to introduction of shuffle argument

3c76da0

vpratz merged commit 677bacb into bayesflow-org:dev Jun 1, 2025
8 of 9 checks passed

Fix #481 #482

Fix #481 #482

Uh oh!

Conversation

elseml commented May 20, 2025

Uh oh!

stefanradev93 commented May 20, 2025

Uh oh!

codecov bot commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vpratz commented May 21, 2025

Uh oh!

elseml commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpratz commented May 23, 2025

Uh oh!

stefanradev93 commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpratz commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefanradev93 commented May 23, 2025

Uh oh!

elseml commented May 23, 2025

Uh oh!

LarsKue commented May 23, 2025

Uh oh!

elseml commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpratz commented May 28, 2025

Uh oh!

vpratz commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpratz commented May 29, 2025

Uh oh!

vpratz commented May 29, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented May 20, 2025 •

edited

Loading

elseml commented May 23, 2025 •

edited

Loading

stefanradev93 commented May 23, 2025 •

edited

Loading

vpratz commented May 23, 2025 •

edited

Loading

elseml commented May 27, 2025 •

edited

Loading

vpratz commented May 29, 2025 •

edited

Loading