Welcome to the cutting edge of AI innovation! This tool extracts text from spoken words and then uses that text to create images.
audio -> audio chunks -> raw text -> image prompt -> image
The amount of images generated will be defined by the LLM model, after refine, summarize and split the text into sentences, considering the limit of 77 tokens per prompt. All images contain the text used to generate them, and the text is also saved in a text file.
Consider the following podcast file: