[OUTDATED; DO NOT MERGE] Vectorial summing of particle displacement at end of kernel loop #1388

erikvansebille · 2023-06-23T12:33:06Z

(note: the PR is now superseded by #1402, which was reset to caf7418 (before the merging of #1399), because Parquet and sqlite writing had in practice far worse performance to zarr)

Until now, the way that Kernel-concatenation worked in Parcels meant that the execution of multiple Kernels was generally not commutable: for example first advection a particle and then letting it sink would lead to a different result than first sinking and then advecting. That is because particle positions were updated within each kernel, instead of the vectorial summing of displacements that would be more appropriate (and would ensure commutability).

More problematically, while the particle positions would be updated within a kernel, the time was generally only updated at the end. This meant that Field-sampling could give unexpected results (e.g. when sampling was done at the new location but the old time).

To solve these issues, in this PR I propose that individual Kernels compute displacements (particle_dlon, particle_dlat and particle_ddepth) and that these displacements are only added to the particle locations when all kernels have been executed for a timestep dt.

This means that the workflow for each particle p and each time t becomes

Set particle_dlon = particle_dlat = particle_ddepth = 0
Execute all Kernels, including sampling Kernels, and let the Kernels update particle_dlon, particle_dlat and particle_ddepth when required
Write the location of the particle p at time t (if needed)
Update time to t += dt and particle locations to particle.lon += particle_dlon etc

Note that this workflow means that particles will be written at t=0 (and there is no need anymore for a separate initial values sampling step!), but not anymore at t=runtime (the last tilmestep), unless we also change that the particle loop is executed one time more at the end (tbd).

Issues/questions that need to be explored or implemented:

a. Move the writing step to after the Kernel execution but before the location and time update
b. Investigate whether it's wise to perform the loop standard one extra time to also write output at t=runtime (see comment above)
c. Check if the particles_backups are stil needed now that particles are not moved in a Kernel
d. Check whether RK45 still works
e. Explore how to deal with Error Kernels
f. Decide if the SummedFields class is still necessary
g. Update this new workflow in documentation and tutorials

Since this is a major overhaul that can change results, I suggest to make this PR (when merged) part of v2.5.0

…_kernel_list

As calling functions is not possible anymore with new implementation

Having to remove 'once' because this does not work in parquet's column-based format

And fixing flake8 error because of merge with plotting-removal PR

erikvansebille · 2023-07-17T15:40:58Z

I've done some exploration of how best to update the kernels. See this gist that explores this on a very minimal example ('nanoparcels')

Bottom line is that the cleanest working implementation (cell 6 in the gist) is something like (only for dlon, similar for other two directions)

def concatenated_kernels(particle, fieldset, time):
    # set dlon to zero at beginning of loop
    dlon = 0

    # sample fields at (time, location), and add to dlon when particle moves
    particle.p = fieldset.P(particle.time, particle.lon)
    dlon += fieldset.U(particle.time, particle.lon) * particle.dt
    
    # write the particle if needed (depending on outputdt)
    particle.write()

    # update the particle time and particle location
    particle.time += particle.dt
    particle.lon += dlon

So this requires a particle.write() function that can be called in a Kernel, something that may be feasible using the Parquet library. This is now being explored in #1399

Also cleaning path creation for Parquet file

michaeldenes · 2023-07-18T12:12:13Z

I've been playing around with his a bit today. The obvious major change is that custom kernels will need to compute displacements, rather than positions. That's easy enough (just subtract the current position particle.lon or particle.lat).

One minor issue is that an unbeaching kernel is still not commutable. You must first perform advection and custom kernels, and only then identify the updated position of the particle (particle.lon + particle_dlon for e.g.) and apply an unbeaching kernel.

For custom kernels that set custom variables (e.g. a variable to track if a particle goes below 500m), I think this would be even more prominent. Should you check an 'if statement' before any movement, and set the custom variable at the current timestep (based on the previous timestep position), or check the 'if statement' after all displacements and set at the current timestep? Any thoughts?

…rning In recent version of numpy, we see a "RuntimeWarning: invalid value encountered in cast". This commit fixes that

As it was not advertised/used, and with the new lazy loading schemes is not needed. It does complicate the code significantly

erikvansebille · 2023-07-20T09:50:34Z

Thanks @michaeldenes, for your feedback on this Draft PR. You're right that actions like unteaching will still not be commutable; kernels where the action depends on position will generally have that problem. The point is that with this PR some can commute. It also means we might be able to get rid of the SummedFields (which was essentially an implementation of vectorial summing for Advection only).

I think/hope that actions like unteaching could be treated in Error/RecoveryKernels, which need a major upgrade anyways. We still use the RecoveryKernel code from v0.9 essentially unchanged, and it is very slow. That's point e. in the list at the top of this page. But I'm first focussing on point a, since that's a sine qua non...

…at_end_of_kernel_list

for more information, see https://pre-commit.ci

So they are available in the kernel

for more information, see https://pre-commit.ci

Keeping field_outputdt for writing Fields (only in Scipy mode)

Moving stmt to main loop; proper closing of files, and support for outputdt in JIT

Via an extra table metadata with one row

for more information, see https://pre-commit.ci

… of https://github.com/OceanParcels/parcels into vectorial_summing_particle_moves_at_end_of_kernel_list

…ction

for more information, see https://pre-commit.ci

Only works in scipy mode for now; probably because memory is not shared between C and python

This significantly speeds up writing in sql

for more information, see https://pre-commit.ci

erikvansebille · 2023-09-08T10:14:18Z

Closing this PR as #1402 is now the working implementation of this idea

erikvansebille added 17 commits June 18, 2023 16:21

Update test_kernel_execution.py

762c72e

Adding support for dcoord in JIT and Scipy mode

74d6c80

Renaming to particle_dlon, particle_dlat etc for easier use

294e787

Merge branch 'master' into vectorial_summing_particle_moves_at_end_of…

57f4a4c

…_kernel_list

Also using particle_dlon and particle_dlat in advectiondiffusion kernels

b826fef

Making sure extra kernels are only compiled once in scipy

ec9ead4

Making sure particle.dt can't get too small (zero) in RK45

830b3a9

Using explicit Euler Forward in Recovery Kernel

caf7418

As calling functions is not possible anymore with new implementation

Changing particlefile format to parquet

0d766fe

Replacing zarr with pyarrow dependency

33845b1

Adding parquet files to .gitignore

5b43478

Further implementing parquet in baseparticlefile

3448091

Having to remove 'once' because this does not work in parquet's column-based format

Updating test_particle_file to parquet format

12a7375

Updating unit and example tests

b88fe76

Updating particleset.from_particlefile to work with parquet

14c35c4

Merge branch 'master' into parquet-writing

541fe58

Cleaning up to_write='once' references

d29bce0

erikvansebille mentioned this pull request Jul 17, 2023

Writing parcels output in Parquet format #1399

Closed

8 tasks

Removing pfile.close() statements

83aefa3

And fixing flake8 error because of merge with plotting-removal PR

erikvansebille added 6 commits July 18, 2023 07:43

Adding support for np.datetime64 to Parquet

0783681

Removing __del__ from particlefileaos and -soa

246352d

Adding support for metadata in pyarrow.parquet.Table.schema.metadata

5057987

removing variables_attribute_dict

8b97773

Also cleaning path creation for Parquet file

Removing particlefile._set_calendar

1168429

Fixing rounding errors in np.timedelta64 for negative times

de52511

erikvansebille added 3 commits July 19, 2023 08:04

Adding implementation for cf_calendars not supported in Parquet

376cded

Casting fields directly to float32 on reading, to fix numpy RuntimeWa…

3a18c03

…rning In recent version of numpy, we see a "RuntimeWarning: invalid value encountered in cast". This commit fixes that

Fixing nan dtypes in particlefile zarr array extending

1c39751

Removing write_ondelete option from particlefiel

e277a00

As it was not advertised/used, and with the new lazy loading schemes is not needed. It does complicate the code significantly

erikvansebille marked this pull request as draft July 20, 2023 09:45

erikvansebille and others added 23 commits July 20, 2023 16:39

Update to interpolation tutorial

f0d51ff

Merge branch 'parquet-writing' into vectorial_summing_particle_moves_…

551c5d1

…at_end_of_kernel_list

Adding very rudimentary support for in-kernel-writing in scipy mode

c7d31ce

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c0714d

for more information, see https://pre-commit.ci

Adding partiflefile attributes to particleset

153366f

So they are available in the kernel

First commit for sqlite implementation inside Kernels

1483501

Creating jit sqlite writing from vars_to_write

a047dca

[pre-commit.ci] auto fixes from pre-commit.com hooks

4c63b1a

for more information, see https://pre-commit.ci

Some sqlite speedup by setting pragma journal_mode = WAL

09c8044

Moving query generation to main JIT loop

db46e7c

Removing outputdt from pset.execute()

b928e54

Keeping field_outputdt for writing Fields (only in Scipy mode)

Updating sqlite implemetation in codeconverter

9831952

Moving stmt to main loop; proper closing of files, and support for outputdt in JIT

Adding support for metadat in sqlite3

3b1c5bd

Via an extra table metadata with one row

Add support for calendars in test_particle_file

c2b974e

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b82ee9

for more information, see https://pre-commit.ci

Removing pyarrow from requirements

199d5d0

Merge branch 'vectorial_summing_particle_moves_at_end_of_kernel_list'…

4ea0268

… of https://github.com/OceanParcels/parcels into vectorial_summing_particle_moves_at_end_of_kernel_list

Add sqlite support for execute without output_file and AnalyticalAdve…

a4b9600

…ction

[pre-commit.ci] auto fixes from pre-commit.com hooks

b6fe67e

for more information, see https://pre-commit.ci

Support for in-memory sqlite database

959b03f

Only works in scipy mode for now; probably because memory is not shared between C and python

Using fabs instead of abs in codegenerator

d1f2ba9

Support for writing sql from memory using sqlite3.backup

0134cc1

This significantly speeds up writing in sql

[pre-commit.ci] auto fixes from pre-commit.com hooks

23d634d

for more information, see https://pre-commit.ci

erikvansebille mentioned this pull request Jul 27, 2023

Re-ordering of Kernel actions to vectorially sum particle displacements #1402

Merged

11 tasks

erikvansebille changed the title ~~Vectorial summing of particle displacement at end of kernel loop~~ [OUTDATED; DO NOT MERGE] Vectorial summing of particle displacement at end of kernel loop Jul 27, 2023

erikvansebille closed this Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OUTDATED; DO NOT MERGE] Vectorial summing of particle displacement at end of kernel loop #1388

[OUTDATED; DO NOT MERGE] Vectorial summing of particle displacement at end of kernel loop #1388

erikvansebille commented Jun 23, 2023 •

edited

Loading

erikvansebille commented Jul 17, 2023

michaeldenes commented Jul 18, 2023

erikvansebille commented Jul 20, 2023

erikvansebille commented Sep 8, 2023

[OUTDATED; DO NOT MERGE] Vectorial summing of particle displacement at end of kernel loop #1388

[OUTDATED; DO NOT MERGE] Vectorial summing of particle displacement at end of kernel loop #1388

Conversation

erikvansebille commented Jun 23, 2023 • edited Loading

erikvansebille commented Jul 17, 2023

michaeldenes commented Jul 18, 2023

erikvansebille commented Jul 20, 2023

erikvansebille commented Sep 8, 2023

erikvansebille commented Jun 23, 2023 •

edited

Loading