Skip to content

Conversation

alanvww
Copy link
Member

@alanvww alanvww commented May 1, 2025

Hello! This PR is adding the depth estimation functionality from tensorflow.js to ml5.

I primarily refer to this example for its performance and results.

Testing sketches:
depthEstimation-video
depthEstimation-single-image

image

  • Set grayscale colormap as default
  • Remove bodySegmentation?
  • Backend option to use transformers.js(!)

Also changed the mention of this in the examples.
Removed console logs, the comments are clear enough without them. Also renamed the examples' <title> tag to match ml5.js format.
@nasif-co
Copy link
Contributor

nasif-co commented Jul 2, 2025

Wanted to add a to do list of task that I'll try to work on for this PR, please let me know if there are suggestions!

  • Reorganize depthmap images in result object
  • Reuse the initial segmentation result in the masking section of processDepthMap()
  • Add dilation filter to the masking section and dilation parameters to the options object
  • Write simple "hello world" examples
  • Diagnose size mismatch issue between source video and depthmap when video is resized.
  • Clean up console.logs in the library file
  • Align code with our p5 2.0 compatibility decisions from p5.js 2.0 Compatibility #244

@alanvww alanvww marked this pull request as ready for review July 11, 2025 05:31
nasif-co added 3 commits July 13, 2025 21:49
Removed the depth estimation tensor from the result object so we could handle disposing of it internally. Also tested ml5.tf.memory() on the current code and found a memory leak, which ended up being due to some segmentation tensors not being disposed. I replaced the disposal code being used here with the one used in the official tensorflow examples, which fixed the leak.
Added the dilation algorithm to the library. The level of dilation is controlled by the config option 'dilationFactor' which takes values between 0 and 10, corresponding to the amount of pixels to grow the background into the silouette. Larger dilation factors affect fps because they need longer loops to look for bounds.

Also made the mask available as a p5.Image in the result, under the name 'mask'. This mask is compatible with the p5 mask() function, so it is easy to use it to cut out the profile from the background.

Lastly, also optimized the helper function that turns imageData into p5.Image, by replacing set() and instead copying the imageData.data array into the pixels array.
The first two examples aimed at being starting points for using the model. One is simply a webcam depth estimation without any interface. The other is the same but using the mask to clear out the background.

Also made applying the segmentation mask the default for the model, since the it performs much better with it.
@shiffman
Copy link
Member

Hi @nasif-co, I took a look at your latest updates and reviewed the examples, amazing work! A few quick questions / thoughts.

Diagnose size mismatch issue between source video and depthmap when video is resized.

Is this an issue only if you resize the video during a sketch, or if you just call video.size(w, h) once in setup() does it break the depth map?

The new examples are fantastic!

  • Am I right that darker pixels are closer to the camera and brighter are further? This is the opposite of my expectation since I think the transformers.js models work the other way. Nothing to change here, just noting it as something to mention in documentation!
  • The "hello world" examples are perfect, exactly as I imagined! Now after seeing them I'm wondering if we might consider including a single example that incorporates something with 3D or uses the pixel data in some way? Perhaps a nested loop through every N pixels and draw a box() for each value with a z-position mapped to the pixel value?
  • I think the example that uses the mask() might be more effective if there is an image or maybe something simple drawn behind the silhouette, it's not so clear what is happening with only clear()!

@nasif-co
Copy link
Contributor

Is this an issue only if you resize the video during a sketch, or if you just call video.size(w, h) once in setup() does it break the depth map?

It happens by calling video.size(w, h) once in setup(), you can see a reproduction of the issue in this sketch.

  • Am I right that darker pixels are closer to the camera and brighter are further? This is the opposite of my expectation since I think the transformers.js models work the other way. Nothing to change here, just noting it as something to mention in documentation!

Yes you are right, it confused me a bit at first. Now that I'm looking into it, changing it to be the other way around seems to be a small change that could be useful in keeping consistency with transformers.js, looking ahead at integrating it. I'll commit that small change.

  • The "hello world" examples are perfect, exactly as I imagined! Now after seeing them I'm wondering if we might consider including a single example that incorporates something with 3D or uses the pixel data in some way? Perhaps a nested loop through every N pixels and draw a box() for each value with a z-position mapped to the pixel value?

Yes! I was looking at including some of those next. I had this sketch I made a few months ago for class using transformers.js which sounds like what you're describing. I'll port that one to ml5. Do you think we should just do the one? I was thinking of also adding one that builds a 3D mesh using the depth map, but I don't know if that becomes more of a tutorial territory than example.

Also was planning on adding an example that showcases how to "detect" distance, so that different interactions can occur depending on how close/far a subject is. Something like a chain of if/else statements, each with a different interaction. I suspect this would be a common use case of the depth estimation model.

  • I think the example that uses the mask() might be more effective if there is an image or maybe something simple drawn behind the silhouette, it's not so clear what is happening with only clear()!

I agree, I'll add something simple in the back, maybe a background color shifting in hue or just an image.

Thanks for the detailed review! :)

nasif-co added 2 commits July 19, 2025 00:45
To match transformers.js: lighter pixels are closer to the camera, darker are farther from it.
To help visualize what using the mask together with the depthMap does.
@nasif-co
Copy link
Contributor

Interestingly, the bug also affects the body segmentation module, when using SelfieSegmentation (see a sketch of it) but not when using BodyPix, which is strange since both use the same function to do the detection.

On the other hand, it makes some sense since the ARPortraitDepth model we are using here also uses SelfieSegmentation, so that may be the root of the issue. I feel it has something to do with the source video's intrinsic dimensions as opposed to its display dimensions, and how the video.size() method only sets display width. The one way I have found to change intrinsic dimensions of the webcam video element is to request them with getUserMedia when creating the capture, which is out of the question.

Going forward, I think the best solution is rendering the video pixels to a separate canvas/p5.Graphics in ml5 and passing that as the detectMedia. Would love to hear some thoughts on this!

nasif-co added 7 commits July 20, 2025 15:38
Bug was due to estimation being done on the source element intrinsic dimensions and not the display dimensions set by the user, leading to an unexpected output. Needed to resize the media given by the user before passing it to the models.

After some discussions in the discord, opted for resizing the input media through tensorflow.js own methods. I think this might be more performant than resizing the image in a canvas but didn't test them side by side to corroborate.
Simplified existing examples and aligned them with the changes in config defaults.
Converted console logs to comments.
Realized the mask and dilation was not being applied to the data array and therefore not to the getDepthAt() method. Fixed it for consistency.
Make the code a little simpler.
Since we already have a webcam video example, it felt redundant to have the depthEstimation-video example also use the webcam. So I modified it to instead showcase how to run depthEstimation with a video file.
nasif-co added 3 commits July 31, 2025 22:20
The depth estimation result now includes the exact frame of the input that was used to generate the returned estimation. This is useful for aligning the image with the estimation, especially if the model is running at a lower fps than the source video (which is most often the case)
Shows how to use the depth estimation result together with p5.js 3D geometry tools to build a live mesh of the webcam video.
@nasif-co nasif-co force-pushed the alan.tensorflow-depth-estimation branch from 48186f8 to 04981a1 Compare August 1, 2025 03:36
Replace the code fixing the size mismatch bug by using the new function designed for that: resizeImageAsTensor.
@nasif-co nasif-co force-pushed the alan.tensorflow-depth-estimation branch from 04981a1 to 085439b Compare August 1, 2025 03:40
@nasif-co
Copy link
Contributor

nasif-co commented Aug 1, 2025

Updated the examples to add the p5 2.0 version. Interestingly, the point cloud example had a performance drop, and the mesh example had a great performance boost. Looking into it in processing/p5.js#6438, it must be related to the mesh example making use of p5.Geometry while the point cloud just uses 3D primitives. May be a good idea to modify the point cloud one in the future to use p5.Geometry instead.

Copy link
Member

@shiffman shiffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incredible work, thank you to @alanvww for getting this started and @nasif-co for completing! This feature won't be released until likely early September so we have time to do additional testing for bugs as well as tweak or alter any of the examples if other contributors have comments. But I'd like to merge this today to mark the end of the summer research period! Happy August! 💜

@shiffman shiffman merged commit 6824b4c into ml5js:main Aug 1, 2025
@alanvww alanvww deleted the alan.tensorflow-depth-estimation branch October 8, 2025 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants