You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a beginner user of aeneas (MacBook 2021 Ventura 13.0.1) with a large amount of experience in natural language processing, audio, algorithms, and software. I understand the basic principals of aeneas and forced alignment algorithms.
I recently noticed that my configuration 'runs out of room' and the alignment begins to produce errors of the same type.
Can someone familiar with the aeneas package help me debug this? I will provide more clear code as we discuss.
Here is the basic outline of my usage:
phrases = [m["text_during"] for m in continuous[i]]
audio = MoviePyUtilities.concat_clips_as_list([AudioFileClip(a["filename"]) for a in grouped_audio[i]], composite=True)
tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=True)
audio.write_audiofile(tmp.name, codec="pcm_s32le", fps=MoviePyUtilities.get_fps(audio))
forced = ForcedAlignment.force_alignment(phrases, tmp.name)
if forced is None: raise RuntimeError(f"Error occurred during ForcedAlignment for continuous index {i}")
Nothing particularly unique in the above: I have a collection of phrases, each about one sentence long, and I have associated audio. I write the audio to a temporary file, and inside the forced_alignment function I will write the phrases to disk.
text_file = tempfile.NamedTemporaryFile(delete=True)
json_file = tempfile.NamedTemporaryFile(delete=True)
with open(text_file.name, "w") as f:
f.write("\n".join(shortened))
args = ["aeneas", audio_filename, text_file.name,
"task_language=eng|os_task_file_format=json|is_text_type=plain", json_file.name]
e = ExecuteTaskCLI()
e.use_sys = False
code = e.run(arguments=args, show_help=False)
if code != 0: raise RuntimeError()
with open(json_file.name) as f:
results = json.load(f)
Here I execute the aeneas package using the configuration shown above. Typical results are published below. I have also tried varying the length of the phrases and the same problem persists.
You can see that the alignment for the first three phrases is roughly correct, and the fourth phrase is essentially provided zero length. This is wrong. It almost appears as though the tempo of the alignment is wrong: in other words, the proportion of the first three phrases is correct, but each 'too long,' and then aeneas simply runs out of length of the audio file.
This package is very important, and its algorithm and implementation is very streamline and an excellent baseline for many more sophisticated audio applications.
Can we debug?
The text was updated successfully, but these errors were encountered:
Same problem here, seems weird to me how the errors accumulate instead of each longer part just chipping off the start of the next one. After all, the start and finish time are the most important and what the thing should analyze, not the duration
@Oleg-A-LLIto@changyr66 Can you post your code and data file examples?
The feature of aeneas is that the underlying technology is simple bigram matched filters.
It should be robust. Or at least straightforward to diagnose. Though I believe the binary is pre-compiled?
I am a beginner user of
aeneas
(MacBook 2021 Ventura 13.0.1) with a large amount of experience in natural language processing, audio, algorithms, and software. I understand the basic principals ofaeneas
and forced alignment algorithms.I recently noticed that my configuration 'runs out of room' and the alignment begins to produce errors of the same type.
Can someone familiar with the
aeneas
package help me debug this? I will provide more clear code as we discuss.Here is the basic outline of my usage:
Nothing particularly unique in the above: I have a collection of phrases, each about one sentence long, and I have associated audio. I write the audio to a temporary file, and inside the
forced_alignment
function I will write the phrases to disk.Here I execute the
aeneas
package using the configuration shown above. Typical results are published below. I have also tried varying the length of the phrases and the same problem persists.You can see that the alignment for the first three phrases is roughly correct, and the fourth phrase is essentially provided zero length. This is wrong. It almost appears as though the tempo of the alignment is wrong: in other words, the proportion of the first three phrases is correct, but each 'too long,' and then
aeneas
simply runs out of length of the audio file.This package is very important, and its algorithm and implementation is very streamline and an excellent baseline for many more sophisticated audio applications.
Can we debug?
The text was updated successfully, but these errors were encountered: