Skip to content

Native Image GC Improvements #2386

Open
@christianwimmer

Description

@christianwimmer

GraalVM Native Image CE currently provides a simple (non-parallel, non-concurrent) stop&copy GC. There are various areas where the GC can be improved. This issue captures ideas. Actual work should be done under separate issues, but linked from this issue so that everyone who wants to work on GC performance gets an overview of who is working on what.

TLAB implementation and sizing algorithm

The heap is divided into chunks, and currently full chunks are used as the TLAB. It is desirable to have a reasonably large chunk size (currently 1 MByte), which is often too big for a TLAB. Especially when many threads are started and some threads have very low allocation rates compared to others, GCs are started too often. It can also lead to pathological cases: when ChunkSize * NumberOfThreads > YoungGenerationSize, there are not enough chunks for all threads and the system starts a GC continuously.

To improve this, the TLAB implementation should be decoupled from the chunk management so that many TLAB can be in one chunk. TLAB size can be adjusted per thread based on the allocation rate of a thread, i.e., threads that allocate a lot still get a whole chunk, while threads that barely allocate get a small TLAB.

Performance improvements in the GC/heap implementation itself

  • Cluster related image heap objects to improve cache locality and reduce footprint and the amount of scanned objects from remembered set
  • Optimize object pinning: do not keep other objects in the chunk alive and avoid creating additional objects
  • Optimize concurrency in low-level memory management (CommittedMemoryProvider)
  • Use a single survivor space instead of one space per object age to avoid internal fragmentation
  • Use prefetch instructions before copying an object to hide the memory latency.
  • Hold more data in the chunk header to avoid querying the space.
  • Implement exact write barriers (the current write barrier marks the whole object as dirty, which is not a good idea for large object arrays)
  • Decrease the size of the write barrier by redesigning it (it would be significantly smaller if objects in unaligned & aligned chunks could be treated the same)

Implement a mark&compact GC for the old generation

The stop&copy GC has a high memory overhead during GC. In the worst case, twice as much memory is needed when the whole heap is reachable, because all objects are copied during a full GC. If the OS cannot provide any memory during GC, then the VM exits because the heap is in an inconsistent state.

For the old generation, a mark&compact algorithm avoids additional memory overhead because compaction happens in place.

Error handling

  • Throw an out-of-memory error if too much time is spent in the GC.
  • Handle out-of-memory conditions during VM operations more gracefully.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions