Allow all supported audio formats in LoadAudio #3941

christian-byrne · 2024-07-04T04:51:39Z

This changes LoadAudio to allow you to use all audio files supported by torchaudio.

This is the function torchaudio (2.3.1) is using to load audio files:

    def load(
        uri: Union[BinaryIO, str, os.PathLike],
        frame_offset: int = 0,
        num_frames: int = -1,
        normalize: bool = True,
        channels_first: bool = True,
        format: Optional[str] = None,
        buffer_size: int = 4096,
        backend: Optional[str] = None,
    ) -> Tuple[torch.Tensor, int]:
        """Load audio data from source.

        By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
        ``float32`` dtype, and the shape of `[channel, time]`.

        Note:
            The formats this function can handle depend on the availability of backends.
            Please use the following functions to fetch the supported formats.

            - FFmpeg: :py:func:`torchaudio.utils.ffmpeg_utils.get_audio_decoders`
            - Sox: :py:func:`torchaudio.utils.sox_utils.list_read_formats`
            - SoundFile: Refer to `the official document <https://pysoundfile.readthedocs.io/>`__.
        rest of docstring...
        """

Here is an alternative to hardocoding the supported formats:

soundfile: soundfile.available_formats()
sox: torchaudio.utils.sox_utils.list_read_formats()
ffmpeg: torchaudio.utils.ffmpeg_utils.get_audio_decoders, but this returns a list of the codecs not the actual file extensions. To get the extensions you can use a subprocess to ffmpeg -formats and then parse the output.

Those are the techniques I used to generate the hardcoded lists. Sometimes the audio player widget doesnt support the format but everything else will still work (e.g., aiff).

mcmonkey4eva · 2024-07-04T06:50:27Z

comfy_extras/nodes_audio.py

+            sox_formats = ['.8svx', '.aif', '.aifc', '.aiff', '.aiffc', '.al', '.amb', '.amr-nb', '.amr-wb', '.anb', '.au', '.avr', '.awb', '.caf', '.cdda', '.cdr', '.cvs', '.cvsd', '.cvu', '.dat', '.dvms', '.f32', '.f4', '.f64', '.f8', '.fap', '.flac', '.fssd', '.gsm', '.gsrt', '.hcom', '.htk', '.ima', '.ircam', '.la', '.lpc', '.lpc10', '.lu', '.mat', '.mat4', '.mat5', '.maud', '.nist', '.ogg', '.paf', '.prc', '.pvf', '.raw', '.s1', '.s16', '.s2', '.s24', '.s3', '.s32', '.s4', '.s8', '.sb', '.sd2', '.sds', '.sf', '.sl', '.sln', '.smp', '.snd', '.sndfile', '.sndr', '.sndt', '.sou', '.sox', '.sph', '.sw', '.txw', '.u1', '.u16', '.u2', '.u24', '.u3', '.u32', '.u4', '.u8', '.ub', '.ul', '.uw', '.vms', '.voc', '.vorbis', '.vox', '.w64', '.wav', '.wavpcm', '.wv', '.wve', '.xa', '.xi']
+            supported.update(sox_formats)
+        if "ffmpeg" in available_backends:
+            ffmpeg_formats = ['.3dostr', '.4xm', '.aa', '.aac', '.aax', '.ace', '.acm', '.act', '.adf', '.adp', '.ads', '.aea', '.afc', '.aix', '.alias_pix', '.amrnb', '.amrwb', '.anm', '.apac', '.apc', '.ape', '.aqtitle', '.argo_brp', '.asf_o', '.av1', '.avr', '.avs', '.bethsoftvid', '.bfi', '.bfstm', '.bin', '.bink', '.binka', '.bitpacked', '.bmp_pipe', '.bmv', '.boa', '.bonk', '.brender_pix', '.brstm', '.c93', '.cdg', '.cdxl', '.cine', '.concat', '.cri_pipe', '.dcstr', '.dds_pipe', '.derf', '.dfa', '.dhav', '.dpx_pipe', '.dsf', '.dsicin', '.dss', '.dtshd', '.dvbsub', '.dvbtxt', '.dxa', '.ea', '.ea_cdata', '.epaf', '.exr_pipe', '.flic', '.frm', '.fsb', '.fwse', '.g729', '.gdv', '.gem_pipe', '.genh', '.gif_pipe', '.hca', '.hcom', '.hdr_pipe', '.hnm', '.idcin', '.idf', '.iec61883', '.iff', '.ifv', '.imf', '.ingenient', '.ipmovie', '.ipu', '.iss', '.iv8', '.ivr', '.j2k_pipe', '.jack', '.jpeg_pipe', '.jpegls_pipe', '.jpegxl_pipe', '.jv', '.kmsgrab', '.kux', '.laf', '.lavfi', '.libcdio', '.libdc1394', '.libgme', '.libopenmpt', '.live_flv', '.lmlm4', '.loas', '.luodat', '.lvf', '.lxf', '.matroska', '.webm', '.mca', '.mcc', '.mgsts', '.mjpeg_2000', '.mlv', '.mm', '.mods', '.moflex', '.mov', '.mp4', '.m4a', '.3gp', '.3g2', '.mj2', '.mpc', '.mpc8', '.mpegtsraw', '.mpegvideo', '.mpl2', '.mpsub', '.msf', '.msnwctcp', '.msp', '.mtaf', '.mtv', '.musx', '.mv', '.mvi', '.mxg', '.nc', '.nistsphere', '.nsp', '.nsv', '.nuv', '.openal', '.paf', '.pam_pipe', '.pbm_pipe', '.pcx_pipe', '.pfm_pipe', '.pgm_pipe', '.pgmyuv_pipe', '.pgx_pipe', '.phm_pipe', '.photocd_pipe', '.pictor_pipe', '.pjs', '.pmp', '.png_pipe', '.pp_bnk', '.ppm_pipe', '.psd_pipe', '.psxstr', '.pva', '.pvf', '.qcp', '.qdraw_pipe', '.qoi_pipe', '.r3d', '.realtext', '.redspark', '.rka', '.rl2', '.rpl', '.rsd', '.s337m', '.sami', '.sbg', '.scd', '.sdns', '.sdp', '.sdr2', '.sds', '.sdx', '.ser', '.sga', '.sgi_pipe', '.shn', '.siff', '.simbiosis_imx', '.sln', '.smk', '.smush', '.sol', '.stl', '.subviewer', '.subviewer1', '.sunrast_pipe', '.svag', '.svg_pipe', '.svs', '.tak', '.tedcaptions', '.thp', '.tiertexseq', '.tiff_pipe', '.tmv', '.tty', '.txd', '.ty', '.v210', '.v210x', '.vag', '.vbn_pipe', '.vividas', '.vivo', '.vmd', '.vobsub', '.vpk', '.vplayer', '.vqf', '.wady', '.wavarc', '.wc3movie', '.webp_pipe', '.wsd', '.wsvqa', '.wve', '.x11grab', '.xa', '.xbin', '.xbm_pipe', '.xmd', '.xmv', '.xpm_pipe', '.xvag', '.xwd_pipe', '.xwma', '.yop']


Is this really the best way to do this? Is there no variable to read or something to get this list dynamically

The methods I mentioned in the PR description can generate the lists dynamically. Here are the trade-offs as far as I can tell:

For soundfile, you have to import soundfile, which may or may not match the version used by torchaudio.

For ffmpeg, you need a way to map codecs to file extensions if using torchaudio.utils.ffmpeg_utils.get_audio_decoders(), or use subprocess.

The reason I hard-coded the lists was becuase I didn't think the supported formats in each library were volatile enough to warrant that overhead. What do you think?

maybe it'd be better to just not list at all, and instead just try to load blindly regardless of file ext, and just error if it errors

I don't think blindly load regardless of ext is good idea, as video/image file will also be recognized and blended into the selection choices.

christian-byrne · 2024-07-04T08:45:43Z

torchaudio can also load and extract audio from video files if ffmpeg is available, which is why video formats are included in the ffmpeg list.

I have been using the LoadAudio node with videos and can confirm it works.

comfy_extras/nodes_audio.py

christian-byrne · 2024-09-16T19:42:44Z

Implemented by #4054.

Allow all supported audio formats in LoadAudio

4842382

christian-byrne requested a review from comfyanonymous as a code owner July 4, 2024 04:51

mcmonkey4eva reviewed Jul 4, 2024

View reviewed changes

huchenlei reviewed Jul 4, 2024

View reviewed changes

comfy_extras/nodes_audio.py Show resolved Hide resolved

Cache get_supported_formats result. Use dict for formats.

bbcbe53

christian-byrne mentioned this pull request Jul 18, 2024

Add content-type filter method to folder_paths #4054

Merged

mcmonkey4eva added User Support A user needs help with something, probably not a bug. and removed User Support A user needs help with something, probably not a bug. labels Sep 12, 2024

christian-byrne closed this Sep 16, 2024

christian-byrne deleted the audio-filetypes branch September 16, 2024 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow all supported audio formats in LoadAudio #3941

Allow all supported audio formats in LoadAudio #3941

christian-byrne commented Jul 4, 2024

mcmonkey4eva Jul 4, 2024

christian-byrne Jul 4, 2024

mcmonkey4eva Jul 4, 2024

huchenlei Jul 4, 2024

christian-byrne commented Jul 4, 2024

christian-byrne commented Sep 16, 2024

Allow all supported audio formats in LoadAudio #3941

Allow all supported audio formats in LoadAudio #3941

Conversation

christian-byrne commented Jul 4, 2024

mcmonkey4eva Jul 4, 2024

Choose a reason for hiding this comment

christian-byrne Jul 4, 2024

Choose a reason for hiding this comment

mcmonkey4eva Jul 4, 2024

Choose a reason for hiding this comment

huchenlei Jul 4, 2024

Choose a reason for hiding this comment

christian-byrne commented Jul 4, 2024

christian-byrne commented Sep 16, 2024