Skip to content

Conversation

@erikvansebille
Copy link
Member

Since the repeatdt option in ParticleSet creation can lead to very poor performance with zarr-output (see e.g. the discussions in #1316 and #1340), we may want to think about completely removing this option from Parcels starting in the next version (v2.5.0?)

Note that repeatdt is not strictly needed. Easy workarounds are:

  1. (preferred solution as long as the total amount of particles is not too large) Create the full particleset at the start of the run, using np.tile(). I added a code snippet in the 'tutorial_delaystart' to explain this:
lons_full = np.tile(np.reshape(lons, (lons.shape[0], lons.shape[1], 1)), (1, 1, len(times)))
lats_full = np.tile(np.reshape(lats, (lats.shape[0], lats.shape[1], 1)), (1, 1, len(times)))
times_full = np.tile(times, (lons.shape[0], lons.shape[1], 1))
  1. (alternative solution in case the total amount of particles is extremely large, e.g. when particles are deleted after a short while) Run the particleset.execute() within a for-loop and manually add new particles using particleset.add()`

@JamiePringle
Copy link
Collaborator

I should have commented on this sooner, but have been at a meeting. The real problems with issues #1316 and # 1340 is that the zarr "obs" chunking is set by the number of particles released in the first time step when particles are released. If the number of particles in each release is large, there is no efficiency problem.

But in cases like my students, in which the releases are governed by the actual introduction of surface drifters in the global surface drifter project, the actual number of drifters in the first release are very small. He does not use repeatdt, but still has this problem. The issues in #1316 and # 1340 are very similar.

I do not have this problem despite heavy use of repeatdt, because I launch at all locations in the first release time, and then in many subsequent. While your second workaround would work for me, it is a bit clumsy.

So removing repeatdt would not really solve the problem, either in these issues or in general. The real solution is to decouple the chunk size from the initial number of particles released.

I may be wrong -- if so what am I missing?

Jamie

@JamiePringle
Copy link
Collaborator

p.s. I am sorry to comment and run, but starting Monday I am out of touch for two weeks. I know I have promised you a patch, but have been too busy with my global parcels runs and other things. By the end of the summer.

@erikvansebille
Copy link
Member Author

Thanks for responding, @JamiePringle, these are very useful comments. The problem is that we can't set the obs chunking to larger than the initial ParticleSet because that breaks with the current MPI implementation now; and potentially even non-MPI runs. I'll check this in the coming weeks.

I agree that removing repeatdt altogether is perhaps a bit drastic, so perhaps better to first see if other solutions are workable. A less-intrusive solution could be to simply issue a warning when using repeated and 'small' ParticleSets (whatever that means), so that users are at least aware of the poor performance...

In the meantime, I covert this PR to draft so that at least we know it won't be implemented soon

@erikvansebille
Copy link
Member Author

Closing this PR, as #1430 is a better implementation/fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants