mlx backend's seeds are not thread safe

**LocalAI version:**
3.6.0

**Environment, CPU architecture, OS, and Version:**
N/A

**Describe the bug**
Concurrent async requests can and will interact with the global seed in memory.

**To Reproduce**
- submit a completion request via API
- cancel the request in the client before the first succeeds
- send another request immediately

**Expected behavior**
The output of each thread should be independent of the other threads, and seed related operations should not effect other requests in flight.

**Additional context**
This is largely inhereted from upstream mlx_lm. `mlx_lm.generate` sets a global stream of which it uses internally:
https://github.com/ml-explore/mlx-lm/blob/367d6d76860499767f62b0bc34408b51c9ed916b/mlx_lm/generate.py#L215-L216



While [mlx.random.key(seed)](https://ml-explore.github.io/mlx/build/html/python/random.html) does implement a way to extract and reuse a PRNG key from a seed, `mlx_lm.generate` provides no way to pass `seed=` or `key=`, let alone `stream=`. It takes a bit of code squinting to follow this because [generate()](https://github.com/ml-explore/mlx-lm/blob/367d6d76860499767f62b0bc34408b51c9ed916b/mlx_lm/generate.py#L747-L753) and [stream_generate()](https://github.com/ml-explore/mlx-lm/blob/367d6d76860499767f62b0bc34408b51c9ed916b/mlx_lm/generate.py#L648-L659) are a nesting doll of kwargs, but once we follow the call stack all the way down to [generate_step()](https://github.com/ml-explore/mlx-lm/blob/367d6d76860499767f62b0bc34408b51c9ed916b/mlx_lm/generate.py#L312-L339), we can confirm that no such parameters are accepted.

This is a long way of saying that the only way to interface with the seeds used by `mlx_lm.generate` is to interact with the global PRNG, which is not thread safe across async requests. The reference server implementation in `mlx_lm.server` [does not disagree](https://github.com/ml-explore/mlx-lm/blob/367d6d76860499767f62b0bc34408b51c9ed916b/mlx_lm/server.py#L343-L346); it interacts with the global PRNG [the same way we do](https://github.com/mudler/LocalAI/blob/ed4ac0b61eac38387170884aab42aee14c0303e0/backend/python/mlx/backend.py#L296-L299), but can get away with it because their API is not asynchronous and blocks for the duration of the call.

Looking into this on my own time, but logging the bug to document the research so far.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

mlx backend's seeds are not thread safe #6734

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

mlx backend's seeds are not thread safe #6734

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions