Skip to content

watch308/dl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AudioCLIP

来源

AndreyGuzhov/AudioCLIP:论文“AudioCLIP:将 CLIP 扩展到图像、文本和音频”中描述的模型的源代码 (https://arxiv.org/abs/2106.13043) --- AndreyGuzhov/AudioCLIP: Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

改动
  1. 数据集(总共10个类别)

  2. 增加采样率,调整音频和文本的权重

  3. 增加准确率来衡量分类任务

    for audio_idx in range(len(paths_to_audio)):
        # acquire Top-3 most similar results
        conf_values, ids = confidence[audio_idx].topk(3)
        
        # format output strings
        query = f'{os.path.basename(paths_to_audio[audio_idx]):>30s} ->\t\t'
        results = ', '.join([f'{LABELS[i]:>15s} ({v:06.2%})' for v, i in zip(conf_values, ids)])
        
        top_label = LABELS[ids[0]]  
        token.append(top_label)
        print(query + results)
        cnt += 1
        true_label = os.path.basename(paths_to_audio[audio_idx]).split('_')[0]
        if true_label == top_label:
          truecnt += 1
    print("准确率:",truecnt/cnt)
  4. 使用stable diffusion模型根据上一步结果生成图片

    from diffusers import DiffusionPipeline
    import torch
    
    pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
    
    
    pipe.to("cuda")
    
    
    for idx, prompt_token in enumerate(token):
        prompt = f" {prompt_token}"  
    
        img = pipe(
            prompt=prompt,
            num_inference_steps=50
        ).images[0]
    
        img.save(f"generated_image_{idx + 1}_{prompt_token}.png")
        print(f"Generated image for '{prompt_token}' saved as 'generated_image_{idx + 1}_{prompt_token}.png'")
Cite
@misc{guzhov2021audioclip,
      title={AudioCLIP: Extending CLIP to Image, Text and Audio}, 
      author={Andrey Guzhov and Federico Raue and Jörn Hees and Andreas Dengel},
      year={2021},
      eprint={2106.13043},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published