diff --git a/LICENSE b/LICENSE index fda2785..e947739 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2023 Hao Hao Tan +Copyright (c) 2024 Hao Hao Tan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index 5bb3b76..35831c2 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,24 @@ ## Frechet Audio Distance in PyTorch -A lightweight library of Frechet Audio Distance calculation. +A lightweight library of Frechet Audio Distance (FAD) calculation. -Currently, we support embedding from: -- `VGGish` by [S. Hershey et al.](https://arxiv.org/abs/1812.08466) -- `PANN` by [Kong et al.](https://arxiv.org/abs/1912.10211) -- `CLAP` by [Wu et al.](https://arxiv.org/abs/2211.06687) +Currently, we support: +- FAD score, with embeddings from: + - `VGGish` by [S. Hershey et al.](https://arxiv.org/abs/1812.08466) + + - `PANN` by [Kong et al.](https://arxiv.org/abs/1912.10211) + - `CLAP` by [Wu et al.](https://arxiv.org/abs/2211.06687) + - `EnCodec` by [Defossez et al.](https://arxiv.org/pdf/2210.13438.pdf) + +- CLAP score, for text and audio matching ### Installation `pip install frechet_audio_distance` -### Demo +### Example + +#### For FAD: ```python from frechet_audio_distance import FrechetAudioDistance @@ -40,12 +47,43 @@ frechet = FrechetAudioDistance( verbose=False, enable_fusion=False, # for CLAP only ) -fad_score = frechet.score("/path/to/background/set", "/path/to/eval/set", dtype="float32") +# to use `EnCodec` +frechet = FrechetAudioDistance( + model_name="encodec", + sample_rate=48000, + channels=2, + verbose=False, +) +fad_score = frechet.score( + "/path/to/background/set", + "/path/to/eval/set", + dtype="float32" +) ``` You can also have a look at [this notebook](https://github.com/gudgud96/frechet-audio-distance/blob/main/test/test_all.ipynb) for a better understanding of how each model is used. +#### For CLAP score: + +```python +from frechet_audio_distance import CLAPScore + +clap = CLAPScore( + submodel_name="630k-audioset", + verbose=True, + enable_fusion=False, +) + +clap_score = clap.score( + text_path="./text1/text.csv", + audio_dir="./audio1", + text_column="caption", +) +``` + +For more info, kindly refer to [this notebook](https://github.com/gudgud96/frechet-audio-distance/blob/main/test/test_clap_score.ipynb). + ### Save pre-computed embeddings When computing the Frechet Audio Distance, you can choose to save the embeddings for future use. diff --git a/pyproject.toml b/pyproject.toml index 2755621..2077549 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "frechet_audio_distance" -version = "0.2.0" +version = "0.3.0" authors = [ { name="Hao Hao Tan", email="helloharry66@gmail.com" }, ] @@ -32,4 +32,7 @@ dependencies = [ ] [project.urls] -"Homepage" = "https://github.com/gudgud96/frechet-audio-distance" \ No newline at end of file +"Homepage" = "https://github.com/gudgud96/frechet-audio-distance" + +[tool.setuptools] +py-modules = [] \ No newline at end of file