Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: Cannot read property 'backend' of undefined - tensor is already disposed when moveData is called. #4237

Closed
sbrl opened this issue Nov 15, 2020 · 19 comments

Comments

@sbrl
Copy link

sbrl commented Nov 15, 2020

I'm trying to setup an LSTM, but Tensorflow.js doesn't seem to like it very much. It keeps crashing with the following error:

Epoch 1 / 50
eta=0.0 --------------------------------------------------------- categoricalCrossentropy=1.04e-5 loss=1.04e-5 /home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:3280
        var srcBackend = info.backend;
                              ^

TypeError: Cannot read property 'backend' of undefined
    at Engine.moveData (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:3280:31)
    at DataStorage.get (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:115:28)
    at NodeJSKernelBackend.getInputTensorIds (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:153:43)
    at NodeJSKernelBackend.executeSingleOutput (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:200:73)
    at NodeJSKernelBackend.reshape (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-node/dist/nodejs_kernel_backend.js:1055:21)
    at forward (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:7430:24)
    at /home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:3480:55
    at /home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:3319:22
    at Engine.scopedRun (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:3329:23)
    at Engine.tidy (/home/sbrl/Documents/code/javascript/node/byte-lstm/node_modules/@tensorflow/tfjs-core/dist/tf-core.node.js:3318:21)

Is there any reason for this? I've double-checked all my tensors going in, and they all contain valid data.

System details:

  • Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz
  • Ubuntu 20.10
  • Nvidia GeForce RTX 2060 Driver Version: 455.38, CUDA Version: 10.0 (11.1 installed globally, 10.0 installed locally, error occurs regardless of whether I use the GPU or not)
  • Packages and versions: @tensorflow/tfjs-node = 2.7.0, @tensorflow/tfjs-node-gpu = 2.7.0

Model summary:

_________________________________________________________________
Layer (type)                 Output shape              Param #   
=================================================================
lstm_LSTM1 (LSTM)            [64,32,1]                 12        
_________________________________________________________________
lstm_LSTM2 (LSTM)            [64,32,32]                4352      
_________________________________________________________________
lstm_LSTM3 (LSTM)            [64,32,1]                 136       
=================================================================
Total params: 4500
Trainable params: 4500
Non-trainable params: 0
_________________________________________________________________

Options objects for the 3 layers:

{
  stateful: true,
  units: 1,
  returnSequences: true,
  inputShape: [ 32, 1 ],
  batchSize: 64
}
{ stateful: true, units: 32, returnSequences: true }
{ stateful: true, units: 1, returnSequences: true }

The dataset is created by passing Uint8Array instances to tf.tensor, which are themselves views of a main ArrayBuffer.

Training code:

await this.model.fitDataset(dataset, {
	epochs: 50,
	verbose: 1,
	yieldEvery: "batch",
	shuffle: false,
	callbacks: {
		onEpochEnd: async (epoch, metrics) => {
			// .....
		}
	}
});

...full code available upon request.

@rthadur
Copy link
Contributor

rthadur commented Nov 16, 2020

@sbrl please provide reproduction code in codepen , thank you

@rthadur rthadur added the type:support user support questions label Nov 16, 2020
@rthadur rthadur self-assigned this Nov 16, 2020
@sbrl
Copy link
Author

sbrl commented Nov 16, 2020

@rthadur Thanks for the reply. Unfortunately, reproducing it in codepen will be extremely challenging because this is a bug in the Node.js version of Tensorflow.js - NOT the browser version. I can certainly provide an isolated Node.js test case instead, but I seriously doubt that a browser test case is going to help anyone debug this issue at all.

@tafsiri
Copy link
Contributor

tafsiri commented Nov 16, 2020

An isolated node.js test case would be great! Thanks

@tafsiri tafsiri assigned tafsiri and unassigned rthadur Nov 16, 2020
@tafsiri tafsiri added type:bug Something isn't working and removed type:support user support questions labels Nov 16, 2020
@sbrl
Copy link
Author

sbrl commented Nov 16, 2020

Sure thing, @tafsiri!

Ok, so it's taken me a while (I couldn't reproduce it for a moment), but here's a link to an isolated test case: https://ybin.me/p/a1270fe449cdf849#ytc4MN1kBD9FJvUHsV3hDr7fYflYxY9ny1+suX75FZg=

Instructions:

  1. Download the file, and save it to isolated-test.mjs.
  2. Run npm install @tensorflow/tfjs-node
  3. Run node ./isolated-test.mjs to execute the code.

A recent version of Node.js is required (I'm using v15.1.0), because I'm using the ES6 module syntax.

@sbrl sbrl closed this as completed Nov 17, 2020
@sbrl sbrl reopened this Nov 17, 2020
@sbrl
Copy link
Author

sbrl commented Nov 17, 2020

Oops, I did NOT mean to close this! Reopened. Hopefully that doesn't affect anything @tafsiri @rthadur?

@tensorflow tensorflow deleted a comment from google-ml-butler bot Nov 17, 2020
@tafsiri
Copy link
Contributor

tafsiri commented Nov 17, 2020

I think I've traced this down to an issue in tf.data.generator and how you are using it in this instance. The core issue is that you are passing the same tensor references as part of multiple input/label tuples. Internally at some point tf will dispose of the tensor once it has been consumed from the generator. Here is a workaround:

// in process_data()
yield {
	xs: last.clone(),
	ys: next.clone()
};

Adding .clone() to the tensor that you yield to the generator will ensure each gets a new id and object reference and will prevent early garbage collection.

@sbrl
Copy link
Author

sbrl commented Nov 18, 2020

Ah, thanks so much! That seems to have fixed it :D :D :D

Is there a way to patch Tensorflow.js to generate a more useful error message in circumstances like these (e.g. if a Tensor doesn't exist, then throwing a nice error to tell the developer about this), or should I just close this issue?

@tafsiri
Copy link
Contributor

tafsiri commented Nov 18, 2020

You can leave it open for now. We can discuss internally what the best solution might be (there are a number of different places we may want to tackle this). I'm going to update the title for when we get a chance to look more closely at this.

@tafsiri tafsiri changed the title Crash: TypeError: Cannot read property 'backend' of undefined TypeError: Cannot read property 'backend' of undefined - tensor is already disposed when moveData is called. Nov 18, 2020
@joshuaellis
Copy link

I think I've traced this down to an issue in tf.data.generator and how you are using it in this instance. The core issue is that you are passing the same tensor references as part of multiple input/label tuples. Internally at some point tf will dispose of the tensor once it has been consumed from the generator. Here is a workaround:

// in process_data()
yield {
	xs: last.clone(),
	ys: next.clone()
};

Adding .clone() to the tensor that you yield to the generator will ensure each gets a new id and object reference and will prevent early garbage collection.

This is really useful to know if you're using React & Refs.

@sbrl
Copy link
Author

sbrl commented Nov 26, 2020

Thanks! I look forward to further news about this.

@grmatthews
Copy link

If this helps as further input to this bug, I got the same error but it went away after I removed the tf.tidy(() => { }) wrapper around where I was building a model, and calling .compile, and .fit.

tensorflow.js was also using webGl

I'll continue to not use tf.tidy for now, but not sure if this is going to cause memory leaks due to (as I understand it) tensorflow managing tensors on my GPU -- otherwise I guess I could back it off to use the CPU.

@gaikwadrahul8
Copy link
Contributor

gaikwadrahul8 commented Apr 4, 2023

Hi, @sbrl

Apologize for the delayed response and May I know have you tried with latest version of @tensorflow/tfjs and recommended version of Node.js from official site ? if not could you please try it from your end and let us know whether your issue is resolving or not ?

@grmatthews, It seems like we have updated our official documentation for tf.tidy() so could you please try with latest version of @tensorflow/tfjs and check whether your issue is resolving or not ?

If issue still persists please let us know ? Thank you!

@google-ml-butler
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you.

@google-ml-butler
Copy link

Closing as stale. Please @mention us if this needs more attention.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@gaikwadrahul8 gaikwadrahul8 self-assigned this Apr 24, 2023
@OrysB
Copy link

OrysB commented Aug 19, 2023

Hey @gaikwadrahul8 , are there any news on this?

The same issue seems to occure in tensorflowjs-node-gpu, but unfortunately I cant open the link to the test case @sbrl provided, so @joshuaellis answer is a little bit confusing to me. I have created a stackoverflow question, with code example, since there are other changes, that might be the source of my Problem, any suggestion would be very helpful.

@jmullings
Copy link

jmullings commented Sep 2, 2023

This backend issue took me hours to figure-out, as this only happed on my backend applications:

async *dataGenerator() {
        for (let batch of this.trainingData as any) {
            let xs: tf.Tensor2D, ys: tf.Tensor2D;
            try {
                const encodedData = await this.encodeData(batch.xs);
                xs = tf.tensor2d(encodedData as any, [batch.xs.length, this.EMBEDDING_SIZE]);
                ys = tf.tensor2d(batch.ys);
                console.count("dataGenerator")
            } finally {
                // DISPOSING MY TENSORS EARLY CAUSED THE ISSUE ///
                // if (xs) {
                //     xs.dispose();
                // }
                // if (ys) {
                //     ys.dispose();
                // }
            }
            yield { xs, ys };
        }
    }

@lukemovement
Copy link

Has there been any progress on this?

@OrysB
Copy link

OrysB commented Oct 19, 2023

@jmullings thankss for your reply. I assume you are using the asyncGenerator with the tf.data.generator function to create a dataset? Unfortunateley this does not work in typescript:

Type 'AsyncGenerator<Example, void, unknown>' provides no match for the signature '(): Iterator<Example, any, undefined> | Promise<Iterator<Example, any, undefined>>'.

It boiled down to this Problem:

Type 'Promise<IteratorResult<Example, void>>' is not assignable to type 'IteratorResult<Example, void>'

So I ended up fetching all the examples before creating a dataset from them. This should not be necessary since I am using a batch number of 1 and only one example at a time is being processed. Or there is something I am getting wrong? ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants