Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage Collection #206

Open
4 tasks
Wodann opened this issue May 26, 2020 · 21 comments
Open
4 tasks

Garbage Collection #206

Wodann opened this issue May 26, 2020 · 21 comments
Labels
tracking Tracking issue for an epic

Comments

@Wodann
Copy link
Collaborator

Wodann commented May 26, 2020

Currently, the Mun runtime defaults to a mark & sweep garbage collector (henceforth GC). This a relatively simple implementation of a GC, that serves as a good proof of concept. However, going foreword we need to mature our GC story.

Goals

  1. Benchmarking - The first step should be to create a benchmark suite that is used to avoid regression and guides high-level optimisations.
    image
    [discussion A] What metrics/use cases are important to benchmark?
  2. Constantly low overhead The Mun GC will have to run at interactive frame rate (30-60 Hz). As such it only has a minimal time budget to spend on GC - every frame.
  3. Rewrite the stack As Mun hot reloads state, data structures have to be mapped in memory. The current Mun GC is limited to rewriting all GC-allocated memory, but in the future we also want it to rewrite stack memory.
  4. Concurrency In the future, we want to implement some form of concurrency within the Mun runtime. As such, GC design discussions should sync up with Mun concurrency design discussions.
@Wodann Wodann added the tracking Tracking issue for an epic label May 26, 2020
@Wodann
Copy link
Collaborator Author

Wodann commented May 26, 2020

Minutes from a Discord discussion on GC:

  • Concurrent, tracing GC
    • cgc is a really simple implementation of a concurrent, copying GC that generalises over types.
      • Mun cannot directly adopt cgc as it operates on byte-memory, similar to the alloc API.
    • Riptide is a relatively simple, concurrent, non-copying GC
      • Even though the current Mun GC is non-copying, Mun will likely have to support some form of barriers (or state points) in the future, so we can also investigate copying GCs.
    • JVM GC is another option
      • JVM's implementation of safepoints cleverly uses function calls (an optimisation over a memory read)
  • Shenandoah is a low-pause GC that uses barriers (video, wiki, paper)
  • LLVM GC might be an interesting option for Mun
  • Stack scanning
    • Conceptually, we can use LLVM to do stack scanning
    • scala-native uses a conservative scanning approach:
      scan = stack_top;
      while scan < stack_end {
        if heap.contains_ptr(scan) {...}
        scan = scan.offset(mem::size_of::<*const c_void>());
      }
    • You can combine a copying GC with conservative root scanning, if you:
      1. allocate page-aligned blocks of memory;
      2. check whether a heap contains a pointer as follows:
      fn has_pointer(&self, ptr: *const u8) -> bool {
        let block_ptr = ptr & BLOCK_MASK;
        for block in self.blocks.iter() {
          if block.header == block_ptr {
             return true;
          }    
        }
        false
      }
      1. tag integers and floats - to prevent numbers from being mistaken for pointers - or allocate them on the heap
    • A caveat of programming in a (tracing) GC'd language can get awkward as you try to avoid generating garbage
  • Reference-counted GC
  • Mark-region GC
    • Immix reference counting approaches performance of the best tracing GCs
      • The immix-rust implementation might be a good starting point for experimentation. There has been some prior discussion on Reddit
  • Static memory management
    • Lifetime system (like Rust)
    • micro-mitten uses static data-flow analysis to approximate heap liveness.
    • A potential caveat that was named is Mun needing a type memory map of all heap allocations, but it's completely possible to still maintain such a memory map when not using a GC.

@baszalmstra
Copy link
Collaborator

Note that immix-rust is not a reference-counting GC

@atennapel
Copy link

atennapel commented May 26, 2020

Also see Counting Immutable Beans for an implementation of reference counting in the Lean theorem prover. They do some clever things like reusing data structures if the reference count is 1. Though it's really focused on purely functional languages.

@pitaj
Copy link

pitaj commented May 26, 2020

I'd just like to point out the option of a colored-pointer GC like ZGC. While ZGC only works on 64-bit architectures (because it needs the extra bits for colored pointers), it has very short pause times that do not scale directly with heap size.

@yorickpeterse
Copy link

yorickpeterse commented May 26, 2020

If I remember correctly, the immix-rust repository does not implement everything of Immix. Specifically, it does not support compacting the heap if my memory serves me right. For Inko I ended up implementing Immix, including support for compacting the heap. If it's of any use, you can find the code for it here:

I also wrote some documentation on this that may help understand the code a bit better: https://inko-lang.org/manual/virtual-machine/memory-management/

I'm happy to answer any questions about Immix as well 😃

@moon-chilled
Copy link

moon-chilled commented May 26, 2020

@pitaj you can use coloured-pointer gcs on 32-bit platforms if you use the low-order bits for colouring. For example, zgc currently uses only 4 bits for colouring, so if you align every allocation to 16 bytes, the low 4 bits are irrelevant and can be used by gc.

@playXE
Copy link

playXE commented May 27, 2020

I'll try to make GC API more generic in: Issue 207 and soon create PR with initial changes to API.

@steveblackburn
Copy link

Lots of interesting thoughts above.

We're doing a full re-write of MMTk in Rust. Although our implementation is not yet public (I hope to have it for you soon), you can take a browse at the Java code base on which the Rust re-write is based. They key difference is that the re-write targets multiple VMs and multiple languages.

@AlexCouch
Copy link

I haven't seen Conservative GC be mentioned yet which is what Crystal uses for GC so I'm gonna leave a link here to both:
Conservative GC
Crystal-lang

@fleabitdev
Copy link

Hi folks,

I was pointed to this issue by @erlend-sh, after announcing GameLisp yesterday.

You might be interested by GameLisp's garbage collector. It's not an exact fit for Mun (because Mun seems to be a much more general-purpose language than GameLisp), but you might be able to borrow some ideas. There's a guide-level explanation here, some basic performance figures here, a detailed technical overview here, and the source code itself is here.

Technical summary:

  • The GC is invoked explicitly once per frame, rather than waiting for a memory allocation.
  • It's generational, which immediately trivializes 90%+ of the garbage generated by a typical game.
  • It's incremental, scaling the amount of work done to the amount of memory allocated since the GC was last invoked. The algorithm contains a few clever tricks to prevent latency spikes, at the cost of increased total memory usage and a more expensive write-barrier.
  • The GC can optionally be implemented without using any unsafe code, by abusing std::rc::Rc.

One of the most surprising things I learned while implementing the GC is that, for games, the raw throughput of the garbage collector just doesn't matter very much. GameLisp's algorithm has a lot of room for optimization (especially if I were willing to make heavier use of unsafe code), but even now, the ratio of script-execution time to GC time is something like 20:1. Because most games are unlikely to spend more than a few milliseconds each frame executing scripts, the cost of the GC is negligible.

Let me know if you have any questions!

@erlend-sh
Copy link

erlend-sh commented May 24, 2022

Might Dart’s GC be a useful reference for Mun? The two languages seem to have pretty similar design goals. Dart uses a generational GC, like GameLisp. @jmeggitt seems to have been tinkering in this space recently.

https://medium.com/flutter/flutter-dont-fear-the-garbage-collector-d69b3ff1ca30

@Techcable
Copy link

Consider looking into the Memory Pool System.

It implements a generational GC with bump-pointer allocation.

Language integration is much easier than with MMTK (having tried both).

Although the library is not very widely known, it is backed by a professional contracting-firm. It is used in production in multiple language implementations (main one is OpenDylan).

This project might be worth taking a look at. In addition to bump-pointer allocation and generational collection, it also supports concurrent collection! It is likely to be faster than whatever you can write yourself (unless you put like a ton of effort in it).

It is not quite as fast level of MMTK/Java/Go, but can be surprisingly competitive 😄

If you do decide to write a collector yourself, "The Garbage Collection Handbook" is very useful (and exceptionally well written).

Downsides on MPS:
Write barriers are done using OS-level page protection. While this makes compilation easier, this means that some of your callback functions might be called from inside signal handlers.

Also collection is (mostly) concurrent, so your trace functions might be called from other threads (and collections can happen any time, not just at allocations).

Because of these two things, Integration is difficult to do unless its in C. The library uses a lot of macro magic. Signal handlers can be tricky to get right in higher-level languages (even Rust) while the macros take care of it in C.

Inline allocation is an exception, and is easy to do even in hand-written/JIT compiled assembly.

@erlend-sh
Copy link

erlend-sh commented May 25, 2022

Language integration is much easier than with MMTK (having tried both).

Were you trying with the Rust version of MMTk? It looks like the rewrite @steveblackburn was talking about has since gone public, and is being actively developed by @qinsoon & co.
https://github.com/mmtk/mmtk-core

@erlend-sh
Copy link

Luau was released after this discussion started. Since it’s specifically made for games (official language for Roblox modding) it surely shares a lot of design goals with Mun.

https://luau-lang.org/performance#improved-garbage-collector-pacing

Improved garbage collector pacing

Luau uses an incremental garbage collector which does a little bit of work every so often, and at no point does it stop the world to traverse the entire heap. The runtime will make sure that the collector runs interspersed with the program execution as the program allocates additional memory, which is known as “garbage collection assists”, and can also run in response to explicit garbage collection invocation via lua_gc. In interactive environments such as video game engines it’s possible, and even desirable, to request garbage collection every frame to make sure assists are minimized, since that allows scheduling the garbage collection to run concurrently with other engine processing that doesn’t involve script execution.

Inspired by excellent work by Austin Clements on Go’s garbage collector pacer, we’ve implemented a pacing algorithm that uses a proportional–integral–derivative controller to estimate internal garbage collector tunables to reach a target heap size, defined as a percentage of the live heap data (which is more intuitive and actionable than Lua 5.x “GC pause” setting). Luau runtime also estimates the allocation rate making it easy (given uniform allocation rates) to adjust the per-frame garbage collection requests to do most of the required GC work outside of script execution.

@baszalmstra
Copy link
Collaborator

Very interesting read!

@steveblackburn
Copy link

@Techcable I'm keen to hear of your MMTk experience. What language were you porting? Any thoughts on what you found difficult?

We have a lively community with a lot of ports underway right now, and some pretty interesting work with new algorithms (outperforming G1 and Shenandoah on OpenJDK now).

@Techcable
Copy link

@Techcable I'm keen to hear of your MMTk experience. What language were you porting? Any thoughts on what you found difficult?

I tried integrating with a Python implementation several months ago. Mostly gave up because writing one by hand seemed to be simpler (definitely not faster).

It looks like the implementation and documentation has improved significantly since than. I will have to try again :)

@erlend-sh
Copy link

erlend-sh commented Jul 6, 2022

Some folks here might be interested in participating in this: https://www.reddit.com/r/rust/comments/vpq6yn/user_study_interest_in_a_rustlike/

by @typesanitizer (twitter.com/typesanitizer)

@typesanitizer
Copy link

@erlend-sh I called off the study because I didn't get much initial interest (only a handful of survey responses) to be able to perform a meaningful study. 😐

@erlend-sh
Copy link

Ah, too bad. In any case, I think Mun is highly relevant to what you're thinking about :)

(Admins, feel free to delete these last comments.)

@erlend-sh
Copy link

https://github.com/kyren/gc-arena by @kyren might be a good fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracking Tracking issue for an epic
Projects
None yet
Development

No branches or pull requests