Skip to content

finding out the stride of the model #35

Open
@albluc24

Description

@albluc24

Hello, I would like to use a model's codebook to label a dataset of audio files by chunks. The first problem I am encountering is to find out the actual stride of the model, that is how much of the audio it can label for one entry. SO I tryed the following: first I tryed, on the 24khz model, to divide model.hop_length/24000, and the result is 0.013333333333333334(which should be a value in seconds), but it doesn't make that much sense as it's not that close to an int. So then I tryed labeling an audio 79.55990929705216 seconds long. In the resultin class, chunk_length is 72, so I tryed dividing the number of resulting representations (result.codes.shape[-1]) which is 8928 by 72, and the result is 124.0. Then, trying to divide the total length in seconds by this value, I obtain 0.6416121717504206(seconds) which is not even close to my first value, even trying to adjust the numbers to account for some sort of padding. This might be a stupid question but I think it's worth asking to save me and everyone else some time as I'm quite lost, I also tryed taking a look through the very clear source code and paper but I think I'm missing something very obvious. Thanks! P.S, if it's necessary please try to include verbal explanations of the images you might want to include in your comments as I am totally blind.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions