-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/fixed rand generator #25
Feature/fixed rand generator #25
Conversation
When `use_rand` is enabled, the fact that token indices are being generated using the default generator results in future rand calls to be offset. This appears to be the reason why in some cases, ToMe generates images that are slightly different from what SD would generate with the same random seed. While some non-determinism is unavoidable on CUDA specifically, this can be avoided by using an independent rand generator for that one step. With this commit, the behavior remains unchanged if `use_rand` is enabled, just to ensure consistency. However, if `rand_seed` is set to any integer, the independent generator will be used and images will start to look more like what SD would produce.
…mbers every time.
Thanks for making the PR! What's the reason for needing one generator per layer? In the examples you have, you were using a single generator for the entire network and it worked fine. As far as I understand, It's not like the generator is reseeded every image, so it's still going to generate different randomness at each layer even if there's only one generator. |
Not a problem, happy to help. Actually I don't have a good response, because you're right. One generator will do. |
Yeah, that way the generator can still be in |
I guess I missed the fact that |
Cool, latest edit looks good to me. I wonder, are there any downsides to having a separate generator? If there's none, maybe having The only issue I can foresee with that is if you don't set Maybe if the seed is None we could just seed the new generator with the state of the current generator. That way, it's controlled by the same seed that stable diffusion itself uses. I think the code for that would look like: if rand_seed is None:
generator.set_state(torch.get_rng_state())
else:
generator.manual_seed(rand_seed) Then the generator gets created and used no matter what, and |
That is actually an excellent way of handling it. However, while searching for an answer I came across rand_idx = torch.randint(sy*sx, size=(hsy, wsx, 1), device=metric.device) To: with torch.random.fork_rng():
rand_idx = torch.randint(sy*sx, size=(hsy, wsx, 1), device=metric.device) Let me know if you'd like to switch to this, because it seems like the most convenient approach. If there is a caveat to this, it's just that it forks the generator per call, which means there's just more calls to the backend. I don't really expect this to result in any noticeable increase in latency, but just thought I'd bring it up. |
Hmm, The documentation says:
This sounds like the forking process isn't free and could potentially be expensive on some systems. We can mitigate that by passing in The main issue is that I don't have a bunch of different devices to test this on. Does passing |
Yeah I don't really think it would affect latency all that much, but having one separate generator and just sticking to that is definitely the approach that's least likely to introduce additional latency. Because forking would act as a store/restore in some ways, so it would just end up being more work, regardless of its insignificance. So yeah I'd recommend keeping the generator as well. I figured out the reason behind size mismatch issue as well. Any thoughts on how you'd want this handled? |
Ah that's not fun. We could just create a function that returns the correct generator, but that reminds me of a bigger issue. We can't create the generator at patch time, because the user might change the device of the model after the patch. Instead, we need some kind of lazy generator that changes based on the current device, and constructs a new generator when it gets an unexpected device / stores a generator per device. All of a sudden, I wonder if torch has any in built functionality for this? For instance, |
Yeah it is a bit of a pickle apparently. I did pick up on I guess we could handle it just by bringing Generator back into the matching method, and setting its state to the current state given the input device. It should account for all of those scenarios, but the catch is that these will all be called at every single hook... I'm pretty sure the latency will still be very little, but it's difficult to imagine that will remain to be the case for all devices. Or alternatively we could have the generator be constructed in the first forward call by adding an extra hook? |
I think the solution with the least issues would be to do this:
This removes the rand seed argument, but I'm not sure it's that useful (and it could be added back by passing |
Sweet, thanks for doing this! Checked the code and looks good to me. |
Glad I could help. Amazing work again! |
When
use_rand
is enabled, the fact that token indices are being generated using the default generator results in future rand calls to be offset. This appears to be the reason why in some cases, ToMe generates images that are slightly different from what SD would generate with the same random seed.While some non-determinism is unavoidable on CUDA specifically, this can be avoided by using an independent rand generator for that one step.
With this commit, the behavior remains unchanged if
use_rand
is enabled, just to ensure consistency. However, ifrand_seed
is set to any integer, the independent generator will be used and images will start to look more like what SD would produce.Here's a few examples:
From right to left are image generated after applying ToMe, image generated without ToMe, image generated with ToMe without randomness, image generated with this commit (unchanged behavior), and image generated with this commit with the random seed set to an integer.