Description
This basically boils down to iterator invalidation. If we have two references to a vector,
a GcVec::push
on one may mutate the vector's length while the user coerced the other reference to a slice (or iterator).
To put it another way, it's impossible for a garbage collected vector/list to implement all of the following properties:
- Coerce the vector into a slice that directly points to the underlying memory
- Allow multiple references to the vector (implement
Copy
) - The ability to mutate the length and potentially re-allocate
- Keep capacity immutable, and have the area
[len, capacity)
uninitialized (as opposed to initialized withnull
)
This is a problem common to all garbage-collected languages, not just zerogc (as I describe below).
What Java/Python do
Most languages sidestep the problem by forbidding option 1 and refusing to give direct pointer-access to garbage-collected arrays.
Usually, forbidding option 1 is not a problem, since arrays in Java and lists in Python are builtin into the language and replace most of the users of pointers.
There are some exceptions though. Java's JNI allows raw access to an arrays's memory with GetPrimitiveArrayCritical
. However, there is no need to worry about mutating length because java arrays have a fixed size (although it does temporarily prevent garbage collection). Python's native CAPI exposes access to a list's raw memory with PyListObject->ob_items
, although native code must be careful of any user code changing .
This problem can also come up in unexpected places. For example, the C implementation of Python's list.sort
coerces the list into a raw pointer, then sorts the memory directly using the pointers. The original code continued to use the original pointer even if a user comparison, leading to bugs. The current implementation temporarily sets the length to zero and carefully guards against any mutations.
What to do
If we took the Java/Python approach of refusing to give out raw pointers, it would also prevent access as a slice.
In idiomatic rust, this is unrealistic, since slices are so pervasive.
As a temporary solution, I ended up forbidding option 2.
However, doesn't work well if you're trying to use GcVec
as the underlying implementation of Java arrays or Python lists. These languages allow users to create multiple references to an array or a list. In our current system, this would require double-indirection of Gc<GcVec<T>>
.
SIDENOTE: Right now GcVec
Instead, I propose a new system of four types and two traits (which will incidentally solve the write-barrier problem):
GcRawVec
- Allows multiple references, but has unsafe slice or pointer access- This will even allow mutable slices, as long as the user correctly triggers write barriers.
GcVec
- Effectively aRefCell<GcRawVec<T>>
. Includes safe slice access.GcVecBuffer
- AGcRawVec
that is!Copy
, but gains safe slice access.trait GcVecAccess
- Abstracts over different forms ofGcVec
. Calls topush
require aGcContext.
Requires the user to pass a contexttrait GcArrayAccess
- Allows read/write access to arrays, without mutating their length.- Implemented for
GcArray
in addition toGcVec
- Implemented for
TODO: Better naming for GcVec
and friends?