Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all supported audio formats in LoadAudio #3941

Closed

Conversation

christian-byrne
Copy link
Contributor

This changes LoadAudio to allow you to use all audio files supported by torchaudio.

This is the function torchaudio (2.3.1) is using to load audio files:

    def load(
        uri: Union[BinaryIO, str, os.PathLike],
        frame_offset: int = 0,
        num_frames: int = -1,
        normalize: bool = True,
        channels_first: bool = True,
        format: Optional[str] = None,
        buffer_size: int = 4096,
        backend: Optional[str] = None,
    ) -> Tuple[torch.Tensor, int]:
        """Load audio data from source.

        By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
        ``float32`` dtype, and the shape of `[channel, time]`.

        Note:
            The formats this function can handle depend on the availability of backends.
            Please use the following functions to fetch the supported formats.

            - FFmpeg: :py:func:`torchaudio.utils.ffmpeg_utils.get_audio_decoders`
            - Sox: :py:func:`torchaudio.utils.sox_utils.list_read_formats`
            - SoundFile: Refer to `the official document <https://pysoundfile.readthedocs.io/>`__.
        rest of docstring...
        """

Here is an alternative to hardocoding the supported formats:

  • soundfile: soundfile.available_formats()
  • sox: torchaudio.utils.sox_utils.list_read_formats()
  • ffmpeg: torchaudio.utils.ffmpeg_utils.get_audio_decoders, but this returns a list of the codecs not the actual file extensions. To get the extensions you can use a subprocess to ffmpeg -formats and then parse the output.

Those are the techniques I used to generate the hardcoded lists. Sometimes the audio player widget doesnt support the format but everything else will still work (e.g., aiff).

sox_formats = ['.8svx', '.aif', '.aifc', '.aiff', '.aiffc', '.al', '.amb', '.amr-nb', '.amr-wb', '.anb', '.au', '.avr', '.awb', '.caf', '.cdda', '.cdr', '.cvs', '.cvsd', '.cvu', '.dat', '.dvms', '.f32', '.f4', '.f64', '.f8', '.fap', '.flac', '.fssd', '.gsm', '.gsrt', '.hcom', '.htk', '.ima', '.ircam', '.la', '.lpc', '.lpc10', '.lu', '.mat', '.mat4', '.mat5', '.maud', '.nist', '.ogg', '.paf', '.prc', '.pvf', '.raw', '.s1', '.s16', '.s2', '.s24', '.s3', '.s32', '.s4', '.s8', '.sb', '.sd2', '.sds', '.sf', '.sl', '.sln', '.smp', '.snd', '.sndfile', '.sndr', '.sndt', '.sou', '.sox', '.sph', '.sw', '.txw', '.u1', '.u16', '.u2', '.u24', '.u3', '.u32', '.u4', '.u8', '.ub', '.ul', '.uw', '.vms', '.voc', '.vorbis', '.vox', '.w64', '.wav', '.wavpcm', '.wv', '.wve', '.xa', '.xi']
supported.update(sox_formats)
if "ffmpeg" in available_backends:
ffmpeg_formats = ['.3dostr', '.4xm', '.aa', '.aac', '.aax', '.ace', '.acm', '.act', '.adf', '.adp', '.ads', '.aea', '.afc', '.aix', '.alias_pix', '.amrnb', '.amrwb', '.anm', '.apac', '.apc', '.ape', '.aqtitle', '.argo_brp', '.asf_o', '.av1', '.avr', '.avs', '.bethsoftvid', '.bfi', '.bfstm', '.bin', '.bink', '.binka', '.bitpacked', '.bmp_pipe', '.bmv', '.boa', '.bonk', '.brender_pix', '.brstm', '.c93', '.cdg', '.cdxl', '.cine', '.concat', '.cri_pipe', '.dcstr', '.dds_pipe', '.derf', '.dfa', '.dhav', '.dpx_pipe', '.dsf', '.dsicin', '.dss', '.dtshd', '.dvbsub', '.dvbtxt', '.dxa', '.ea', '.ea_cdata', '.epaf', '.exr_pipe', '.flic', '.frm', '.fsb', '.fwse', '.g729', '.gdv', '.gem_pipe', '.genh', '.gif_pipe', '.hca', '.hcom', '.hdr_pipe', '.hnm', '.idcin', '.idf', '.iec61883', '.iff', '.ifv', '.imf', '.ingenient', '.ipmovie', '.ipu', '.iss', '.iv8', '.ivr', '.j2k_pipe', '.jack', '.jpeg_pipe', '.jpegls_pipe', '.jpegxl_pipe', '.jv', '.kmsgrab', '.kux', '.laf', '.lavfi', '.libcdio', '.libdc1394', '.libgme', '.libopenmpt', '.live_flv', '.lmlm4', '.loas', '.luodat', '.lvf', '.lxf', '.matroska', '.webm', '.mca', '.mcc', '.mgsts', '.mjpeg_2000', '.mlv', '.mm', '.mods', '.moflex', '.mov', '.mp4', '.m4a', '.3gp', '.3g2', '.mj2', '.mpc', '.mpc8', '.mpegtsraw', '.mpegvideo', '.mpl2', '.mpsub', '.msf', '.msnwctcp', '.msp', '.mtaf', '.mtv', '.musx', '.mv', '.mvi', '.mxg', '.nc', '.nistsphere', '.nsp', '.nsv', '.nuv', '.openal', '.paf', '.pam_pipe', '.pbm_pipe', '.pcx_pipe', '.pfm_pipe', '.pgm_pipe', '.pgmyuv_pipe', '.pgx_pipe', '.phm_pipe', '.photocd_pipe', '.pictor_pipe', '.pjs', '.pmp', '.png_pipe', '.pp_bnk', '.ppm_pipe', '.psd_pipe', '.psxstr', '.pva', '.pvf', '.qcp', '.qdraw_pipe', '.qoi_pipe', '.r3d', '.realtext', '.redspark', '.rka', '.rl2', '.rpl', '.rsd', '.s337m', '.sami', '.sbg', '.scd', '.sdns', '.sdp', '.sdr2', '.sds', '.sdx', '.ser', '.sga', '.sgi_pipe', '.shn', '.siff', '.simbiosis_imx', '.sln', '.smk', '.smush', '.sol', '.stl', '.subviewer', '.subviewer1', '.sunrast_pipe', '.svag', '.svg_pipe', '.svs', '.tak', '.tedcaptions', '.thp', '.tiertexseq', '.tiff_pipe', '.tmv', '.tty', '.txd', '.ty', '.v210', '.v210x', '.vag', '.vbn_pipe', '.vividas', '.vivo', '.vmd', '.vobsub', '.vpk', '.vplayer', '.vqf', '.wady', '.wavarc', '.wc3movie', '.webp_pipe', '.wsd', '.wsvqa', '.wve', '.x11grab', '.xa', '.xbin', '.xbm_pipe', '.xmd', '.xmv', '.xpm_pipe', '.xvag', '.xwd_pipe', '.xwma', '.yop']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the best way to do this? Is there no variable to read or something to get this list dynamically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods I mentioned in the PR description can generate the lists dynamically. Here are the trade-offs as far as I can tell:

  • For soundfile, you have to import soundfile, which may or may not match the version used by torchaudio.
  • For ffmpeg, you need a way to map codecs to file extensions if using torchaudio.utils.ffmpeg_utils.get_audio_decoders(), or use subprocess.

The reason I hard-coded the lists was becuase I didn't think the supported formats in each library were volatile enough to warrant that overhead. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it'd be better to just not list at all, and instead just try to load blindly regardless of file ext, and just error if it errors

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think blindly load regardless of ext is good idea, as video/image file will also be recognized and blended into the selection choices.

@christian-byrne
Copy link
Contributor Author

torchaudio can also load and extract audio from video files if ffmpeg is available, which is why video formats are included in the ffmpeg list.

I have been using the LoadAudio node with videos and can confirm it works.

@mcmonkey4eva mcmonkey4eva added User Support A user needs help with something, probably not a bug. and removed User Support A user needs help with something, probably not a bug. labels Sep 12, 2024
@christian-byrne
Copy link
Contributor Author

Implemented by #4054.

@christian-byrne christian-byrne deleted the audio-filetypes branch September 16, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants