Initial ACE-Step model implementation.#7972
Conversation
|
(Automated Bot Message) CI Tests are running, you can view the results at https://ci.comfy.org/?branch=7972%2Fmerge |
|
This uses torchaudio in an unchecked load - ie in an environment where torchaudio isn't available, comfy fails to boot |
|
torchaudio has been in the requirements.txt for 11 months now, which environment doesn't have torchaudio? |
|
DirectML mainly (I know, that's the worst way to run anything, but people do it sometimes). To my understanding it's basically any "modified torch version" is missing torchaudio usually, ie other non-nvidia GPU setups tend to be missing it too. I think even the early blackwell torch had audio wonked? Not sure on that bit, secondhand memory. |
|
Should be fixed now. |
|
I found the issue: ace-step/ACE-Step#54 |
|
With the default workflow settings, I can't sing Chinese songs, but I can sing them on the Gradio interface of ACE Step. |
|
I tried the workflow and it works fine, but I have the following problems. It seems to use more RAM in the VAE (or not release caches before?) than the official implementation and then falls back to Tiled VAE for generations that the Gradio UI can do without tiled VAE. Second the quality of longer songs is worse than the official one, can this be an effect of using the tiled VAE for longer songs, or should it have the same quality? |
|
my nodes aren't loading... |
|
@planb788 I had the same issue earlier, C/J/K characters don't seem to pass through right -- EDIT 5d3cc85 looks like specific custom hacks are needed? This commit added Japanese in particular by just converting it on the fly to latin characters. |
|
@mcmonkey4eva I'm following both discussions and would like to combine the best of both approaches to get longer generations with the best quality. At the moment I'm probably waiting for the gradio app to get the multires scheduler as it exposes more control, but we'll probably see workflows soon that also control the parameters exposed in the official UI. My main question at the moment is whether the tiled VAE affects the quality. It's hard to tell which artifacts come from the model and which may come from such workarounds for low VRAM. On my system, the gradio apps work with 16 GB VRAM (almost full when VAE is loaded) and Comfy needs to tile the VAE and also seems to need more VRAM for longer generation, while the VRAM requierement during generation seems to be almost the same as for shorter ones in the gradio app. |
Put in ComfyUI/models/checkpoints: https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/tree/main/all_in_one
Copy paste to ComfyUI for workflow: