Description
TensorFlow.js version
tfjs-node 2.0.1
Browser version
Node.js 12.16.3
Describe the problem or feature request
When tfjs-node is required in a new environment (such as a worker when using worker threads) and has already been required in another environment before (such as the main thread), the new TF backend instantiated in the new environment overwrites any previously instantiated one.
This behavior implies that if you try to:
- Load a SavedModel in an environment (such as the main thread, or a worker thread)
- Load another SavedModel in a different environment (another worker thread).
Then the SavedModel loaded in 1/ will "disappear" (Tensor not referenced
errors when trying to run it), even in the environment it was created in. BUT the environment in 1/ will be able to access and run the new SavedModel created in 2/.
It seems to me that the expected behavior would be rather one of those:
- (a) The TF backend is shared between all the different environments tfjs-node is required in. It means that if a SavedModel is loaded in env 2, it will also be available in env 1, but without overwriting any SavedModel loaded in env 1. In the same way, models loaded in env 1 will be available in env 2.
- (b) Different, isolated TF backend for each environment, leading in turn to environments isolated from each other when handling SavedModels, each having to load the models it wants to use.
Code to reproduce the bug / link to feature request
I coded fixes corresponding to the two different behaviors:
- (a) https://github.com/tensorflow/tfjs/compare/master...Pierrci:backend-shared?expand=1 - Very straightforward
- (b) https://github.com/tensorflow/tfjs/compare/master...Pierrci:backend-per-agent?expand=1 - To isolate the TF backend for each environment I've been using the experimental Environment Life Cycle APIs. In addition to being experimental, it's also only available starting Node.js v12.8.0. But it seems other methods could work too (wider availability but more cumbersome, didn't try).
I've been experimenting with the two fixes for my use case which uses multiple worker threads to interact with SavedModels successfully. I'm willing to work on a PR for (a) or (b), which can be amended obviously with your feedback (particularly for the solution for (b)).