Skip to content

Conversation

@thowell
Copy link
Collaborator

@thowell thowell commented Jan 3, 2026

improve memory utilization for ccd by implementing arena memory.

  • memory for ccd, the epa and multicontact arrays, is preallocated in a single wp.array that is added to Data as arena
  • arena memory is utilized within kernels to back specific epa or multicontact arrays
  • naccdmax is added to Data and specifies the arena size. this value is the maximum number of expected contacts for any ccd collision pair. since a kernel is launched for each ccd collision pair type, it is only necessary to allocate enough memory for the collision pair with the most contacts. (for scenes with many different types of ccd collision pairs, this memory savings can probably be significant)
  • only a subset of the epa arrays are necessary for multicontact. as a result, epa arrays that are not utilized by multicontact can be overwritten during multicontact, potentially further reducing memory utilization.

aloha pot

this pr

mjwarp-testspeed benchmark/aloha_pot/scene.xml --nconmax=24 --njmax=128 --memory
Total JIT time: 0.70 s
Total simulation time: 4.13 s
Total steps per second: 1,983,104
Total realtime factor: 3,966.21 x
Total time per step: 504.26 ns
Total converged worlds: 8192 / 8192

Total memory: 2002.56 MB / 48640.12 MB (4.12%)
Model memory (0.27%):
 (no field >= 1% of utilized memory)
Data memory (99.73%):
 geom_xmat: 57.38 MB (2.87%)
 efc.J: 96.00 MB (4.79%)
 arena: 1569.75 MB (78.39%)

this pr with --nccdmax=12

mjwarp-testspeed benchmark/aloha_pot/scene.xml --nconmax=24 --njmax=128 --memory --nccdmax=12
Total JIT time: 0.70 s
Total simulation time: 4.13 s
Total steps per second: 1,982,124
Total realtime factor: 3,964.25 x
Total time per step: 504.51 ns
Total converged worlds: 8192 / 8192

Total memory: 1217.69 MB / 48640.12 MB (2.50%)
Model memory (0.44%):
 (no field >= 1% of utilized memory)
Data memory (99.56%):
 geom_xpos: 19.12 MB (1.57%)
 geom_xmat: 57.38 MB (4.71%)
 qM: 18.00 MB (1.48%)
 qLD: 16.53 MB (1.36%)
 efc.J: 96.00 MB (7.88%)
 arena: 784.88 MB (64.46%)

the reduction in ccd memory is ~50%

main (e3bd7c6)

note: ccd memory utilization is not currently reported on main branch

mjwarp-testspeed benchmark/aloha_pot/scene.xml --nconmax=24 --njmax=128 --memory
Total JIT time: 0.67 s
Total simulation time: 4.16 s
Total steps per second: 1,968,630
Total realtime factor: 3,937.26 x
Total time per step: 507.97 ns
Total converged worlds: 8192 / 8192

Total memory: 432.81 MB / 48640.12 MB (0.89%)
Model memory (1.24%):
 (no field >= 1% of utilized memory)
Data memory (98.76%):
 xfrc_applied: 4.88 MB (1.13%)
 xmat: 7.31 MB (1.69%)
 ximat: 7.31 MB (1.69%)
 geom_xpos: 19.12 MB (4.42%)
 geom_xmat: 57.38 MB (13.26%)
 site_xmat: 4.78 MB (1.10%)
 cinert: 8.12 MB (1.88%)
 actuator_moment: 10.06 MB (2.32%)
 crb: 8.12 MB (1.88%)
 qM: 18.00 MB (4.16%)
 qLD: 16.53 MB (3.82%)
 cvel: 4.88 MB (1.13%)
 cacc: 4.88 MB (1.13%)
 cfrc_int: 4.88 MB (1.13%)
 cfrc_ext: 4.88 MB (1.13%)
 contact.frame: 6.75 MB (1.56%)
 contact.efc_address: 4.50 MB (1.04%)
 efc.J: 96.00 MB (22.18%)
 efc.quad: 12.00 MB (2.77%)
 subtree_bodyvel: 4.88 MB (1.13%)

Copy link
Collaborator

@Kenny-Vilella Kenny-Vilella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good work !

Just have a few minor comments.

print(
f"Data\n nworld: {d.nworld} naconmax: {d.naconmax} njmax: {d.njmax}" + f" naccdmax: {d.naccdmax}\n"
if d.naccdmax != d.naconmax
else "\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not print anything if naccdmax != naconmax?

ncollision: collision count from broadphase (1,)
naccdmax: Maximum number of CCD contacts
nccd: geom-geom pair counter for arena slots (len(GeomType)*(len(GeomType)+1)/2,)
arena: Arena memory for CCD (narena,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Not sure if it is clear what narena is.

ncollision: array(1, int)

# warp only: preallocated arena for convex collision scratch memory
naccdmax: int # max number of CCD contacts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] I would remove the comments as it is the only place with such comments in the file

njmax: Number of constraints to allocate per world. Constraint arrays are
batched by world: no world may have more than njmax constraints.
naconmax: Number of contacts to allocate for all worlds. Overrides nconmax.
naccdmax: Maximum number of CCD contacts. Defaults to naconmax.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say clearly that naccdmax value has priority over nccdmax value?
Same for nconmax/naconmax actually.

epa_vert1, epa_vert2, epa_vert_index1, epa_vert_index2, epa_face.
"""
MJ_MAX_EPAFACES = 5
MJ_MAX_EPAHORIZON = 12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two values should probably be imported from types

return naccdmax * epa_total_per_collision

# multiccd arrays
# polygon, clipped: 2 * nmaxpolygon vec3s each
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use "vec3s" with a s?

epa_map, epa_horizon). The multicontact inputs are placed first:
epa_vert1, epa_vert2, epa_vert_index1, epa_vert_index2, epa_face.
"""
epa_vert_dim = 5 + epa_iterations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] It is a bit strange to sometimes use epa_iterations and sometimes ccd_iterations.
I assume that these two terms will always be equal, if it is not the case, then I will double check that they are use appropriately throughout the code.

# epa_horizon: index pair (i j) of edges on horizon
epa_horizon = wp.empty(shape=(d.naconmax, 2 * MJ_MAX_EPAHORIZON), dtype=int)
# reset ccd arena counter
d.nccd.zero_()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed?

epa_horizon = epa_horizon_in[tid]
# construct epa arrays from arena
# multicontact inputs first (epa_vert1, epa_vert2, epa_vert_index1, epa_vert_index2, epa_face)
base_offset = arenaid * wp.static(per_collision_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting.
We are now allocating per-collision within the array, while the former approach is allocating per-array.

I wonder if we see any perf difference with the different memory layout.

# epa arrays used by multicontact

# epa_vert1: vertices in EPA polytope in geom 1 space
layout["epa_vert1"] = (offset, epa_vert_dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Not totally convinced that we should keep the dim here, it makes the code a bit inconsistent below.
But it seems to be quite subjective so please feel free to follow what you think is best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize GJK device memory usage

2 participants