Skip to content

mlx backend's seeds are not thread safe #6734

@blightbow

Description

@blightbow

LocalAI version:
3.6.0

Environment, CPU architecture, OS, and Version:
N/A

Describe the bug
Concurrent async requests can and will interact with the global seed in memory.

To Reproduce

  • submit a completion request via API
  • cancel the request in the client before the first succeeds
  • send another request immediately

Expected behavior
The output of each thread should be independent of the other threads, and seed related operations should not effect other requests in flight.

Additional context
This is largely inhereted from upstream mlx_lm. mlx_lm.generate sets a global stream of which it uses internally:
https://github.com/ml-explore/mlx-lm/blob/367d6d76860499767f62b0bc34408b51c9ed916b/mlx_lm/generate.py#L215-L216

While mlx.random.key(seed) does implement a way to extract and reuse a PRNG key from a seed, mlx_lm.generate provides no way to pass seed= or key=, let alone stream=. It takes a bit of code squinting to follow this because generate() and stream_generate() are a nesting doll of kwargs, but once we follow the call stack all the way down to generate_step(), we can confirm that no such parameters are accepted.

This is a long way of saying that the only way to interface with the seeds used by mlx_lm.generate is to interact with the global PRNG, which is not thread safe across async requests. The reference server implementation in mlx_lm.server does not disagree; it interacts with the global PRNG the same way we do, but can get away with it because their API is not asynchronous and blocks for the duration of the call.

Looking into this on my own time, but logging the bug to document the research so far.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions