Initial ACE-Step model implementation. by comfyanonymous · Pull Request #7972 · Comfy-Org/ComfyUI

comfyanonymous · 2025-05-07T11:52:52Z

Put in ComfyUI/models/checkpoints: https://huggingface.co/Comfy-Org/ACE-Step_ComfyUI_repackaged/tree/main/all_in_one

Copy paste to ComfyUI for workflow:

{
  "id": "88ac5dad-efd7-40bb-84fe-fbaefdee1fa9",
  "revision": 0,
  "last_node_id": 45,
  "last_link_id": 112,
  "nodes": [
    {
      "id": 44,
      "type": "ConditioningZeroOut",
      "pos": [
        785,
        459
      ],
      "size": [
        197.712890625,
        26
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "conditioning",
          "type": "CONDITIONING",
          "link": 108
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            109
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ConditioningZeroOut"
      },
      "widgets_values": []
    },
    {
      "id": 40,
      "type": "CheckpointLoaderSimple",
      "pos": [
        179.5068359375,
        87.76739501953125
      ],
      "size": [
        375,
        98
      ],
      "flags": {},
      "order": 0,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            111
          ]
        },
        {
          "name": "CLIP",
          "type": "CLIP",
          "links": [
            80
          ]
        },
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            83
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "CheckpointLoaderSimple"
      },
      "widgets_values": [
        "ace_step_v1_3.5b.safetensors"
      ]
    },
    {
      "id": 18,
      "type": "VAEDecodeAudio",
      "pos": [
        1370,
        100
      ],
      "size": [
        150.93612670898438,
        46
      ],
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 101
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 83
        }
      ],
      "outputs": [
        {
          "name": "AUDIO",
          "type": "AUDIO",
          "links": [
            26
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecodeAudio"
      },
      "widgets_values": []
    },
    {
      "id": 17,
      "type": "EmptyAceStepLatentAudio",
      "pos": [
        710,
        540
      ],
      "size": [
        270,
        82
      ],
      "flags": {},
      "order": 1,
      "mode": 0,
      "inputs": [],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "links": [
            23
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "EmptyAceStepLatentAudio"
      },
      "widgets_values": [
        120,
        1
      ]
    },
    {
      "id": 19,
      "type": "SaveAudio",
      "pos": [
        1539,
        100
      ],
      "size": [
        295.0655212402344,
        112
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "audio",
          "type": "AUDIO",
          "link": 26
        }
      ],
      "outputs": [],
      "properties": {},
      "widgets_values": [
        "audio/ComfyUI"
      ]
    },
    {
      "id": 3,
      "type": "KSampler",
      "pos": [
        1040,
        90
      ],
      "size": [
        315,
        262
      ],
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 112
        },
        {
          "name": "positive",
          "type": "CONDITIONING",
          "link": 110
        },
        {
          "name": "negative",
          "type": "CONDITIONING",
          "link": 109
        },
        {
          "name": "latent_image",
          "type": "LATENT",
          "link": 23
        }
      ],
      "outputs": [
        {
          "name": "LATENT",
          "type": "LATENT",
          "slot_index": 0,
          "links": [
            101
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "KSampler"
      },
      "widgets_values": [
        315277030015967,
        "randomize",
        50,
        4,
        "res_multistep",
        "simple",
        1
      ]
    },
    {
      "id": 45,
      "type": "ModelSamplingSD3",
      "pos": [
        716.4029541015625,
        -30.665313720703125
      ],
      "size": [
        270,
        58
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "inputs": [
        {
          "name": "model",
          "type": "MODEL",
          "link": 111
        }
      ],
      "outputs": [
        {
          "name": "MODEL",
          "type": "MODEL",
          "links": [
            112
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "ModelSamplingSD3"
      },
      "widgets_values": [
        4.000000000000001
      ]
    },
    {
      "id": 14,
      "type": "TextEncodeAceStepAudio",
      "pos": [
        580,
        83
      ],
      "size": [
        410.834716796875,
        305.39215087890625
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "inputs": [
        {
          "name": "clip",
          "type": "CLIP",
          "link": 80
        }
      ],
      "outputs": [
        {
          "name": "CONDITIONING",
          "type": "CONDITIONING",
          "links": [
            108,
            110
          ]
        }
      ],
      "properties": {
        "Node name for S&R": "TextEncodeAceStepAudio"
      },
      "widgets_values": [
        "female, electronic, vocals, singing, upbeat, fast, fennec core ",
        "[verse]\ncute fennec girl\nmassive fennec ears\nbig fluffy tail\nlong blonde wavy hair\nlarge blue eyes\nI love fennec girl\n"
      ]
    }
  ],
  "links": [
    [
      23,
      17,
      0,
      3,
      3,
      "LATENT"
    ],
    [
      26,
      18,
      0,
      19,
      0,
      "AUDIO"
    ],
    [
      80,
      40,
      1,
      14,
      0,
      "CLIP"
    ],
    [
      83,
      40,
      2,
      18,
      1,
      "VAE"
    ],
    [
      101,
      3,
      0,
      18,
      0,
      "LATENT"
    ],
    [
      108,
      14,
      0,
      44,
      0,
      "CONDITIONING"
    ],
    [
      109,
      44,
      0,
      3,
      2,
      "CONDITIONING"
    ],
    [
      110,
      14,
      0,
      3,
      1,
      "CONDITIONING"
    ],
    [
      111,
      40,
      0,
      45,
      0,
      "MODEL"
    ],
    [
      112,
      45,
      0,
      3,
      0,
      "MODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "frontendVersion": "1.18.9"
  },
  "version": 0.4
}

github-actions · 2025-05-07T12:06:56Z

(Automated Bot Message) CI Tests are running, you can view the results at https://ci.comfy.org/?branch=7972%2Fmerge

mcmonkey4eva · 2025-05-07T22:12:43Z

This uses torchaudio in an unchecked load - ie in an environment where torchaudio isn't available, comfy fails to boot

comfyanonymous · 2025-05-08T01:07:44Z

torchaudio has been in the requirements.txt for 11 months now, which environment doesn't have torchaudio?

mcmonkey4eva · 2025-05-08T01:21:00Z

DirectML mainly (I know, that's the worst way to run anything, but people do it sometimes). To my understanding it's basically any "modified torch version" is missing torchaudio usually, ie other non-nvidia GPU setups tend to be missing it too. I think even the early blackwell torch had audio wonked? Not sure on that bit, secondhand memory.

comfyanonymous · 2025-05-08T01:37:32Z

Should be fixed now.

ChuxiJ · 2025-05-08T03:58:24Z

I found the issue: ace-step/ACE-Step#54
I am figuring out what the difference is.

planb788 · 2025-05-08T17:00:59Z

With the default workflow settings, I can't sing Chinese songs, but I can sing them on the Gradio interface of ACE Step.

allo- · 2025-05-08T21:06:47Z

I tried the workflow and it works fine, but I have the following problems. It seems to use more RAM in the VAE (or not release caches before?) than the official implementation and then falls back to Tiled VAE for generations that the Gradio UI can do without tiled VAE. Second the quality of longer songs is worse than the official one, can this be an effect of using the tiled VAE for longer songs, or should it have the same quality?

agustincaniglia · 2025-05-08T23:21:43Z

my nodes aren't loading...

mcmonkey4eva · 2025-05-09T06:47:15Z

@planb788 I had the same issue earlier, C/J/K characters don't seem to pass through right -- EDIT 5d3cc85 looks like specific custom hacks are needed? This commit added Japanese in particular by just converting it on the fly to latin characters.
@allo- see the above link ace-step/ACE-Step#54 it has some discussion about the differences in parameters between comfy and the gradio, comfy's workflow went for different defaults than the gradio uses.
@agustincaniglia sounds like a support issue, not specific to this PR - open an issue on the issues tab or join the https://discord.gg/comfyorg and ask for help there

allo- · 2025-05-09T08:45:13Z

@mcmonkey4eva I'm following both discussions and would like to combine the best of both approaches to get longer generations with the best quality. At the moment I'm probably waiting for the gradio app to get the multires scheduler as it exposes more control, but we'll probably see workflows soon that also control the parameters exposed in the official UI.

My main question at the moment is whether the tiled VAE affects the quality. It's hard to tell which artifacts come from the model and which may come from such workarounds for low VRAM.

On my system, the gradio apps work with 16 GB VRAM (almost full when VAE is loaded) and Comfy needs to tile the VAE and also seems to need more VRAM for longer generation, while the VRAM requierement during generation seems to be almost the same as for shorter ones in the gradio app.

comfyanonymous and others added 5 commits May 7, 2025 07:52

Initial ACE-Step model implementation.

1ecdc6b

Fix ruff.

9cc8636

Fix.

7705dcc

Fix

1f0814a

Forgot this.

c4d9905

comfyanonymous requested review from Kosinkadink, huchenlei, ltdrdata, pythongosssss, robinjhuang, webfiltered and yoland68 as code owners May 7, 2025 12:04

comfyanonymous added the Run-CI-Test This is an administrative label to tell the CI to run full automatic testing on this PR now. label May 7, 2025

comfyanonymous merged commit 16417b4 into master May 7, 2025
19 checks passed

comfyanonymous deleted the temp_pr branch May 7, 2025 12:34

christian-byrne mentioned this pull request May 7, 2025

[Bug] Audio workflow Tuple index out of range during decode tiled #7976

Open

alisson-anjos mentioned this pull request May 7, 2025

ComfyUI Integration ace-step/ACE-Step#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial ACE-Step model implementation.#7972

Initial ACE-Step model implementation.#7972
comfyanonymous merged 5 commits intomasterfrom
temp_pr

comfyanonymous commented May 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

Uh oh!

mcmonkey4eva commented May 7, 2025

Uh oh!

comfyanonymous commented May 8, 2025

Uh oh!

mcmonkey4eva commented May 8, 2025

Uh oh!

comfyanonymous commented May 8, 2025

Uh oh!

ChuxiJ commented May 8, 2025

Uh oh!

planb788 commented May 8, 2025

Uh oh!

allo- commented May 8, 2025

Uh oh!

agustincaniglia commented May 8, 2025

Uh oh!

mcmonkey4eva commented May 9, 2025 •

edited

Loading

Uh oh!

allo- commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

comfyanonymous commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

Uh oh!

mcmonkey4eva commented May 7, 2025

Uh oh!

comfyanonymous commented May 8, 2025

Uh oh!

mcmonkey4eva commented May 8, 2025

Uh oh!

comfyanonymous commented May 8, 2025

Uh oh!

ChuxiJ commented May 8, 2025

Uh oh!

planb788 commented May 8, 2025

Uh oh!

allo- commented May 8, 2025

Uh oh!

agustincaniglia commented May 8, 2025

Uh oh!

mcmonkey4eva commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allo- commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

comfyanonymous commented May 7, 2025 •

edited

Loading

mcmonkey4eva commented May 9, 2025 •

edited

Loading