Skip to content

Audit stdlib for mutable state #10960

Closed
Closed
@abbysmal

Description

@abbysmal

This is moved from ocaml-multicore/ocaml-multicore#757

This issue tracks the status of auditing stdlib for mutable state. OCaml 5.00 stdlib will have the following guarantees:

  1. Memory safety -- no crashes if stdlib modules are concurrently used by multiple domains
  2. Modules with mutable interfaces such as Stack and Queue are not made thread-safe
  3. Modules with top-level mutable state is made safe(r) or unsafety documented

There are two categories by which we classify the stdlib modules.

Top-level state

The module has some top-level mutable state that may cause surprising behaviours. For example, Arg module has a top-level mutable ref called current that represents the cursor of the Sys.argv argument being parsed. If two domains concurrently use Arg module, then they may see arguments being skipped. These cases either need to be:

  1. fixed to be safe for concurrent access (like Filename.temp_file, Format module for predefined buffers) or
  2. their behaviour documented and left alone (such as the Arg module; it is reasonable to expect only the main domain to parse command-line arguments).

Mutable interface

The module may create mutable state and return it. For example, Queue, Stack, Hashtbl, etc. These modules will be left as sequential only and not thread-safe. Multiple concurrent invocations may lead to non-linearizable behaviours. We leave it to the user to put a mutex around the use of the mutable structure (or use thread-safe libraries such as domainslib).

Not all mutable interfaces are unsafe. For example, concurrent array get and set are fine. But we still mark the array for mutable interface. The reason is that, we also use mutable interface to indicate cases where the observed concurrent behaviour cannot be explained by assuming that each API call to the module executes atomically (linearizability). For example, though an individual get and set of Array fields is safe, we mark it as mutable interface as iteration functions that modify the same array may leave the array in a state that cannot be explained by linearizability.

Stdlib modules

The column "needs work" tracks whether code changes need to be made for OCaml 5.00 MVP.

Needs work column will be N if the work has already been done. For example, the Format module has top-level mutable state, which has been made domain-safe already in the Multicore OCaml 5.00 branch. Another example is Printexc, which has been made thread-safe in OCaml trunk in a forward-compatible manner with multicore.

Needs work does not encompass documentation; Needs work may be N and documentation may need to be updated.

Module Top-level state Mutable interface Needs work Notes
arg.ml Y Y ?? current
array.ml N Y N
arrayLabels.ml N N N Only refers to Array
atomic.ml N N N Newly added in OCaml 5.00 (safe by construction)
bigarray.ml N Y N
bool.ml N N N
buffer.ml N Y Y Segfault on parallel access (see #11279)
bytes.ml N Y N Document unsynchronized mixed sized accesses
bytesLabels.ml N N N Only refers to Bytes
callback.ml N N N
camlinternalAtomic.ml N N N
camlinternalFormat.ml N Y N see type buffer
camlinternalFormatBasics.ml N N N
camlinternalLazy.ml N Y Y Lazy must be handled specially for memory-safety. Unify RacyLazy and Undefined exceptions?
camlinternalMod.ml N Y N See uses of Obj.set_field, Lazy.force
camlinternalOO.ml
char.ml N N N
complex.ml N N N
condition.ml N N N Newly added in OCaml 5.00 (safe by construction)
digest.ml N N N
domain.ml N N N Newly added in OCaml 5.00 (safe by construction)
effectHandlers.ml N N N Newly added in OCaml 5.00 (safe by construction)
either.ml N N N
ephemeron.ml N N Y New ephemerons are immutable. Implement Bucket module as in OCaml trunk.
filename.ml Y Y Y current_temp_dir_name and setter getter functions
float.ml N N N
format.ml Y Y N OCaml 5.00 makes pre-defined formatters safe
fun.ml N N N
gc.ml N Y ?? alarm may need attention. caml_gc_get/set may need attention
hashtbl.ml Y Y ?? Uses Random state, which has been made domain-local. What about the non-atomic top-level ref randomized?
in_channel.ml N N Y Concurrent calls to input_all will likely not return contiguous chunks. (already an issue on 4.X)
int.ml N N N
int32.ml N N N
int64.ml N N N
lazy.ml N N N The complexity handled in camlinternalLazy.ml
lexing.ml N Y N
list.ml N N N
listLabels.ml N N N Just refers to the List module
map.ml N N N
marshal.ml N N N Clarify documentation about marshaling a concurrently modified object. Due to OCaml memory model ensuring absence of out-of-thin-air values, no crashes.
moreLabels.ml N N N
mutex.ml N N N Newly added in OCaml 5.00 (safe by construction)
nativeint.ml N N N
obj.ml N Y N
oo.ml
option.ml N N N
out_channel.ml
parsing.ml Y Y ?? current_lookahead_fun and a few effectful functions.
pervasives.ml Is now Stdlib
printexc.ml Y Y Y Top-level state has been made atomic. See raw_backtrace_entries which returns an array (which could be modified concurrently). set_uncaught_exception needs to be checked (uses a global reference)
printf.ml
queue.ml N Y N
random.ml Y Y N Splittable PRNG worked tracked in #10742.
result.ml N N N
scanf.ml ? N N Scanf is thread-unsafe because of internal buffer usages, plus the fact that they may be sharing a channel and thus complicate the situation further. Should probably be documented as thread-unsage.
semaphore.ml N N N Newly added in OCaml 5.00 (safe by construction)
seq.ml N N N
set.ml N N N
stack.ml N Y ?? Already not thread safe ?
stdLabels.ml N N N
std_exit.ml N N N
stdlib.ml ? ? Y caml_shutdown as called by Stdlib.exit may access a non-atomic startup_count variable. This may cause issues under Multicore. Should be atomic?
string.ml N N N Remove deprecated functions in 5.00?
stringLabels.ml N N N
sys.ml Y Y Y caml_ml_enable_runtime_warnings is a global boolean. To be documented or made atomic?
uchar.ml N N N
unit.ml N N N
weak.ml N Y N

otherlibs

Module Top-level state Mutable interface Needs work Notes
win32unix/unix.ml
bigarray/bigarray.ml
unix/unix.ml
unix/unixLabels.ml
str/str.ml N N N Has been made thread-safe with #10670
systhreads/thread.ml Y N Y set_uncaught_exception modifies a global reference
systhreads/event.ml N N N
dynlink/ Y Y Y Dynlink should have a mutex inside it to ensure it doesnt crash especially in bytecode. (excerpt from WG 4)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions