Skip to content

impl Copy for GcVec #30

Closed
Closed
@Techcable

Description

@Techcable

This basically boils down to iterator invalidation. If we have two references to a vector,
a GcVec::push on one may mutate the vector's length while the user coerced the other reference to a slice (or iterator).

To put it another way, it's impossible for a garbage collected vector/list to implement all of the following properties:

  1. Coerce the vector into a slice that directly points to the underlying memory
  2. Allow multiple references to the vector (implement Copy)
  3. The ability to mutate the length and potentially re-allocate
  4. Keep capacity immutable, and have the area [len, capacity) uninitialized (as opposed to initialized with null)

This is a problem common to all garbage-collected languages, not just zerogc (as I describe below).

What Java/Python do

Most languages sidestep the problem by forbidding option 1 and refusing to give direct pointer-access to garbage-collected arrays.
Usually, forbidding option 1 is not a problem, since arrays in Java and lists in Python are builtin into the language and replace most of the users of pointers.

There are some exceptions though. Java's JNI allows raw access to an arrays's memory with GetPrimitiveArrayCritical. However, there is no need to worry about mutating length because java arrays have a fixed size (although it does temporarily prevent garbage collection). Python's native CAPI exposes access to a list's raw memory with PyListObject->ob_items, although native code must be careful of any user code changing .

This problem can also come up in unexpected places. For example, the C implementation of Python's list.sort coerces the list into a raw pointer, then sorts the memory directly using the pointers. The original code continued to use the original pointer even if a user comparison, leading to bugs. The current implementation temporarily sets the length to zero and carefully guards against any mutations.

What to do

If we took the Java/Python approach of refusing to give out raw pointers, it would also prevent access as a slice.

In idiomatic rust, this is unrealistic, since slices are so pervasive.
As a temporary solution, I ended up forbidding option 2.

However, doesn't work well if you're trying to use GcVec as the underlying implementation of Java arrays or Python lists. These languages allow users to create multiple references to an array or a list. In our current system, this would require double-indirection of Gc<GcVec<T>>.

SIDENOTE: Right now GcVec

Instead, I propose a new system of four types and two traits (which will incidentally solve the write-barrier problem):

  • GcRawVec - Allows multiple references, but has unsafe slice or pointer access
    • This will even allow mutable slices, as long as the user correctly triggers write barriers.
  • GcVec - Effectively a RefCell<GcRawVec<T>>. Includes safe slice access.
  • GcVecBuffer - A GcRawVec that is !Copy, but gains safe slice access.
  • trait GcVecAccess - Abstracts over different forms of GcVec. Calls to push require a GcContext. Requires the user to pass a context
  • trait GcArrayAccess - Allows read/write access to arrays, without mutating their length.
    • Implemented for GcArray in addition to GcVec

TODO: Better naming for GcVec and friends?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions