Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI Masks #12295

Open
mamvdwel opened this issue Aug 9, 2022 · 26 comments
Open

AI Masks #12295

mamvdwel opened this issue Aug 9, 2022 · 26 comments
Labels
feature: new new features to add

Comments

@mamvdwel
Copy link

mamvdwel commented Aug 9, 2022

Is your feature request related to a problem? Please describe.
Yes the problem of creating a good mask for complex areas (so when it is very difficult or labour intensive to create one)

Describe the solution you'd like
Typing (or choosing from presets) the area you would like to select (e.g. 'sky' or 'trees')
An AI should then create a parametric, a drawn or a parametric + drawn mask in the module at hand. This mask can then - if necessary - be fine tuned by the user

Alternatives
None

Additional context
This would speed up the creation of masks tremendously

@MStraeten
Copy link
Collaborator

MStraeten commented Aug 9, 2022

after the AI is implemented by someone (thats the less complex part of the job), who should do the training?
you're aware that for each module in the pipe the training must be done since each input values isn't not identical depending on the preceding module.

@mamvdwel
Copy link
Author

mamvdwel commented Aug 9, 2022 via email

@spaceChRiS
Copy link

I have been thinking a lot about such a feature recently, since I very successfully used rembg to save me days of work with better result than I ever was able to achieve with regular masks (examples at the end).

I could imagine that such a tool is implemented as a module which is for the pipeline essentially a no-op but provides a raster mask for later modules. It would come early in the pipeline, maybe just before exposure, and bring its own downstripped fix image pipeline to prepare the image for the AI: essentially exposure, and a generic base curve to ensure the image data is in a condition that is similar to the training data of the AI.

As a first step, for the AI itself, the rembg approach seems very handy, and, very important: it is comparatively fast, even for my 30 MP 48 bpp tiff images it typically runs in less than 1 s, but I have no exact timing. Most of it may be reading and writing the tiff files anyway, which would not be required in the darktable case, as rembg would be used as a library anyway and the data is already in memory.

At least it would be a starting point, and improved networks and more controls could be added later.

Examples:

The task was to have headshots of my son's soccer team, in front of neutral background. The light was not ideal, harsh afternoon summer sun, no chance to overpower with the flashes I own. But it was the only possible date, just before the training to not get red faces.

This is the best I was able to achieve in darktable 2 years ago, the GIMP tools for semi-automatic foreground extraction were worse, but even here you can see the yellow cast on the background. This took me min. an hour per photograph, plus the usual editing, and I even do not have the chance for background replacement, I had to accept the white.
grafik

With rembg, this year, it took 1 hour for all 20+ players. I know it is blown out as the lighting condition was even worse this time, but this is about the background removal, which is IMO much better. It was a dark background this time, btw.
grafik

It is not perfect, but given that it took <1 s and no parameters at all, and that I started this year again to try it by myself and was not able to complete 1 image in several hours (which eventually led to my choice to try rembg), having such a raster mask in darktable would be incredibly useful.

Btw, it also worked very well for dark hair in front of the dark background, and also dark skin color, and the background was by no means flat and even. It only failed to recognize holes in 1 case, where a little hole in between arm and body was not detected properly, in several similar cases it recognized the same hole. With a combined painted mask this would have been very easy to fix, but I needed only the head and shoulders from this image.

@mamvdwel
Copy link
Author

mamvdwel commented Aug 10, 2022 via email

@jenshannoschwalm
Copy link
Collaborator

I was experimenting with non-AI stuff like segmentation algorithms, they can do some sort of content detection and could be used as a preparation step for a drawn mask ...

@spaceChRiS
Copy link

Or as preparation step for the AI solution … 😉

Seriously, I am really impressed by what this approach (AI, in particular rembg) can achieve. Of course it fails in some cases as well, but the net time saving you can have with cumbersome tasks is incredible. In particular, the results are sometimes excellent in scenarios where it is hard to believe that a “classical” algorithm would work at all. An example I had is one of the soccer team's members, who has very dark skin and almost black hair, in front of a very dark background, but the AI was able to generate a perfect mask. A classical algorithm based on local features (differences in tone, contrast, color etc.) may have failed miserably.

@Jiyone
Copy link
Contributor

Jiyone commented Aug 12, 2022

the problem of creating a good mask for complex areas

There is already parametric mask and mask option to manage complex areas 🤔

@github-actions
Copy link

This issue did not get any activity in the past 60 days and will be closed in 365 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

@ga-it
Copy link

ga-it commented Jun 20, 2023

I think the beta launch of Photoshop generative fill has amplified the opportunity for use of AI in darktable (even if via user API credentials to a cloud setvice:

https://www.adobe.com/za/products/photoshop/generative-fill.html

While masking can show skill, this should be secondary to photographic and artistic effort

@github-actions
Copy link

This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

@mfg92
Copy link

mfg92 commented Oct 28, 2023

I'd like to point to Segment Anything, it's open source and very impressive (you can try it with your own images on their website).

I imagine a workflow like this:

  1. You do some basic editing, in a sense that the brightness and colours are roughly correct.
  2. You press a button labelled "AI Masking".
    1. The image is rendered at medium resolution.
    2. This image is sent to Segment Anything
    3. Now the user can interact with the image like in the demo of Segment Anything to do his masking.
    4. The user clicks on "Finish AI Masking".
    5. A drawn mask is generated from the data from Segment Anything (COCO format) and sent back to Darktable
  3. The user can further refine the drawn mask in Darktable.

Such a solution should not be completely out of reach to be implemented and would drastically improve my workflow.

@jenshannoschwalm jenshannoschwalm added feature: new new features to add and removed no-issue-activity labels Oct 28, 2023
@jenshannoschwalm
Copy link
Collaborator

I did give it a try. Yes - impressive!

Although the "send to somewhere and get back a result" workflow doesn't seem good to me. We would depend on that service being provided "for ever". We would prefer to use a git submodule and run locally i think.

Copy link

This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

@mfg92
Copy link

mfg92 commented Aug 31, 2024

I'm still interested in this :)

@victoryforce
Copy link
Collaborator

I'm still interested in this :)

You are not alone. Me too :)

@righthandabacus
Copy link

I am interested in this feature as well. However, instead of hooking up to a particular model, how about open up an interface for customized scripts? For example, when I need a mask and I am intended to run segment anything, all I need is (1) the original image (2) a sample coordinate that the user selects. Then call an external program (say, a command line that the user specified), and return a binary mask. I think that would make the "AI Mask" feature more flexible and also make the implementation much easier.

@MStraeten
Copy link
Collaborator

it’s obvious, that AI generated mask are quite useful - but it doesn’t need more users to support the request but someone to bite the bullet and contribute code …

Copy link

This issue has been marked as stale due to inactivity for the last 60 days. It will be automatically closed in 300 days if no update occurs. Please check if the master branch has fixed it and report again or close the issue.

@fabiangebert
Copy link

Is there any infrastructure for running models like SAM2 in Darktable yet? Or would you prefer this to be a LUA script and the user would have to figure out themselves how to get Python up and running etc.?

@wpferguson
Copy link
Member

Is there any infrastructure for running models like SAM2 in Darktable yet?

Not that I'm aware of.

Or would you prefer this to be a LUA script

Probably the best. That way we can work out the bugs and mechanics without being tied to a release schedule. Once we understand it, then we would have a better idea of how to incorporate it into darktable.

the user would have to figure out themselves how to get Python up and running etc.?

If running something in Python is a requirement, then they will probably have to figure it out anyway. I don't see embedding Python in darktable as something that will happen.


This will probably require an API change to get the mask imported and available for use.

I can help/do the API change and help with the script if you need it.

@righthandabacus
Copy link

righthandabacus commented Jan 2, 2025

the user would have to figure out themselves how to get Python up and running etc.?
If running something in Python is a requirement, then they will probably have to figure it out anyway. I don't see embedding Python in darktable as something that will happen.

Yes, I expect most of these "AI models" would require Python to run because that's the easiest way to do it today. However, that means a lot more dependencies (e.g., package installed). Thus don't try to solve the user's problem. Just define the interface (i.e., how to send the current image and mouse position from darktable, and should the mask be read as XML or JSON format) and make a generic way of calling an external program.

This will probably require an API change to get the mask imported and available for use.
I can help/do the API change and help with the script if you need it.

Yes, please. I tried to read the code but it is a time-saver if you can point to where is the most relevant functions.

@fabiangebert
Copy link

@righthandabacus are you experienced with writing LUA scripts for Darktable? Haven't done that yet, would be capable to review or follow up on a piece of code though.

@righthandabacus
Copy link

@fabiangebert I know some Lua, but not on Darktable so far. I think the pain point right now is implementing the C-Lua interface for masking functions.

@jenshannoschwalm
Copy link
Collaborator

For me the first point would be a design decision. What would be the best way to integrate any "content controlled" masks into the pixelpipe. As i see it, it could be an addition to the masks as we have it.

Any module supporting masks could

  1. present the image rgb data plus an initial mask we create by the usual dt mask tools. For eample a small circle in the face of a person.
  2. A clever algorithm would take the image & mask data and return a "processed mask"

@TurboGit
Copy link
Member

TurboGit commented Jan 3, 2025

Also, bear with me but I still think AI mask are not that important. I see people using them to select an object or a people but applying a effect on a sharp mask is terrible. So at least we need to do better...

To me an interface to an AI mask should be run on an image and returns a set of named mask which should be populated into the mask manager. I also think that the AI mask button should be into the mask manager. From there, users should be able to pick a mask and use it in any module.

@ga-it
Copy link

ga-it commented Jan 3, 2025

I think the mask would allow feathering, etc in the module it's used in (so not a "sharp mask")?

But 100% agree that the mask should exist in the mask manager and not in an originating module.

I was actually going to request this alone as a feature for all masks - I find it incredibly irritating going back in the history stack to a previous action and losing masks - i.e. mask definitions should not be part of the history - application of masks should

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: new new features to add
Projects
None yet
Development

No branches or pull requests