Skip to content

Commit

Permalink
AudioCraft v1.0.0 release with training code, AudioGen, MultiBandDiff…
Browse files Browse the repository at this point in the history
…usion etc.
  • Loading branch information
JadeCopet committed Aug 2, 2023
1 parent 539ec04 commit bf70522
Show file tree
Hide file tree
Showing 204 changed files with 15,938 additions and 936 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/audiocraft_docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ jobs:
- name: Make docs
run: |
. env/bin/activate
make docs
git add -f docs
git commit -m docs
make api_docs
git add -f api_docs
git commit -m api_docs
- name: Push branch
run: |
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/audiocraft_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ jobs:
steps:
- uses: actions/checkout@v2
- uses: ./.github/actions/audiocraft_build
- run: |
- name: Run unit tests
run: |
. env/bin/activate
make tests
- name: Run integration tests
run: |
. env/bin/activate
make tests_integ
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ wheels/
.coverage

# docs
/docs
/api_docs

# dotenv
.env
Expand All @@ -46,6 +46,13 @@ wheels/
venv/
ENV/

# egs with manifest files
egs/*
!egs/example
# local datasets
dataset/*
!dataset/example

# personal notebooks & scripts
*/local_scripts
*/notes
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [1.0.0] - 2023-08-02

Major revision, added training code for EnCodec, AudioGen, MusicGen, and MultiBandDiffusion.
Added pretrained model for AudioGen and MultiBandDiffusion.

## [0.0.2] - 2023-08-01

Improved demo, fixed top p (thanks @jnordberg).
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Contributing to Audiocraft
# Contributing to AudioCraft

We want to make contributing to this project as easy and transparent as
possible.

## Pull Requests

Audiocraft is the implementation of a research paper.
AudioCraft is the implementation of a research paper.
Therefore, we do not plan on accepting many pull requests for new features.
We certainly welcome them for bug fixes.

Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ include *.ini
include requirements.txt
include audiocraft/py.typed
include assets/*.mp3
recursive-include conf *.yaml
27 changes: 23 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
INTEG=AUDIOCRAFT_DORA_DIR="/tmp/magma_$(USER)" python3 -m dora -v run --clear device=cpu dataset.num_workers=0 optim.epochs=1 \
dataset.train.num_samples=10 dataset.valid.num_samples=10 \
dataset.evaluate.num_samples=10 dataset.generate.num_samples=2 sample_rate=16000 \
logging.level=DEBUG
INTEG_COMPRESSION = $(INTEG) solver=compression/debug rvq.n_q=2 rvq.bins=48 checkpoint.save_last=true # SIG is 616d7b3c
INTEG_MUSICGEN = $(INTEG) solver=musicgen/debug dset=audio/example compression_model_checkpoint=//sig/5091833e \
transformer_lm.n_q=2 transformer_lm.card=48 transformer_lm.dim=16 checkpoint.save_last=false # Using compression model from 616d7b3c
INTEG_AUDIOGEN = $(INTEG) solver=audiogen/debug dset=audio/example compression_model_checkpoint=//sig/5091833e \
transformer_lm.n_q=2 transformer_lm.card=48 transformer_lm.dim=16 checkpoint.save_last=false # Using compression model from 616d7b3c
INTEG_MBD = $(INTEG) solver=diffusion/debug dset=audio/example \
checkpoint.save_last=false # Using compression model from 616d7b3c

default: linter tests

install:
Expand All @@ -10,12 +22,19 @@ linter:

tests:
coverage run -m pytest tests
coverage report --include 'audiocraft/*'
coverage report

tests_integ:
$(INTEG_COMPRESSION)
$(INTEG_MBD)
$(INTEG_MUSICGEN)
$(INTEG_AUDIOGEN)


docs:
pdoc3 --html -o docs -f audiocraft
api_docs:
pdoc3 --html -o api_docs -f audiocraft

dist:
python setup.py sdist

.PHONY: linter tests docs dist
.PHONY: linter tests api_docs dist
171 changes: 39 additions & 132 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,14 @@
# Audiocraft
# AudioCraft
![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)

Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.
AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code
for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.

## MusicGen

Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
them in parallel, thus having only 50 auto-regressive steps per second of audio.
Check out our [sample page][musicgen_samples] or test the available demo!

<a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
<a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
<img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
</a>
<br>

We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.

## Installation
Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
AudioCraft requires Python 3.9, PyTorch 2.0.0. To install AudioCraft, you can run the following:

```shell
# Best to make sure you have torch installed first, in particular before installing xformers.
Expand All @@ -33,143 +17,66 @@ pip install 'torch>=2.0'
# Then proceed to one of the following
pip install -U audiocraft # stable release
pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
pip install -e . # or if you cloned the repo locally
```

## Usage
We offer a number of way to interact with MusicGen:
1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing).
3. You can use the gradio demo locally by running `python app.py`.
4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU).
5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly
updated with contributions from @camenduru and the community.
6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.

## API

We provide a simple API and 4 pre-trained models. The pre trained models are:
- `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
- `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium)
- `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody)
- `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large)

We observe the best trade-off between quality and compute with the `medium` or `melody` model.
In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller
GPUs will be able to generate short sequences, or longer sequences with the `small` model.

**Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
You can install it with:
```
apt-get install ffmpeg
```

See after a quick example for using the API.

```python
import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8) # generate 8 seconds.
wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions) # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
```

## 🤗 Transformers Usage

MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies
and additional packages. Steps to get started:

1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:

```
pip install git+https://github.com/huggingface/transformers.git
pip install -e . # or if you cloned the repo locally (mandatory if you want to train).
```

2. Run the following Python code to generate text-conditional audio samples:

```py
from transformers import AutoProcessor, MusicgenForConditionalGeneration


processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")

inputs = processor(
text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
padding=True,
return_tensors="pt",
)

audio_values = model.generate(**inputs, max_new_tokens=256)
We also recommend having `ffmpeg` installed, either through your system or Anaconda:
```bash
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install 'ffmpeg<5' -c conda-forge
```

3. Listen to the audio samples either in an ipynb notebook:
## Models

```py
from IPython.display import Audio
At the moment, AudioCraft contains the training code and inference code for:
* [MusicGen](./docs/MUSICGEN.md): A state-of-the-art controllable text-to-music model.
* [AudioGen](./docs/AUDIOGEN.md): A state-of-the-art text-to-sound model.
* [EnCodec](./docs/ENCODEC.md), a state-of-the-art high fidelity neural audio codec.
* [Multi Band Diffusion](./docs/MBD.md): EnCodec compatible decoder using diffusion.

sampling_rate = model.config.audio_encoder.sampling_rate
Audio(audio_values[0].numpy(), rate=sampling_rate)
```
## Training code

Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models.
For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to
the [AudioCraft training documentation](./docs/TRAINING.md).

```py
import scipy
For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model
that provides pointers to configuration, example grids and model/task-specific information and FAQ.

sampling_rate = model.config.audio_encoder.sampling_rate
scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
```

For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the
[MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on
[Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb).
## API documentation

## Model Card
We provide some [API documentation](https://facebookresearch.github.io/audiocraft/api_docs/audiocraft/index.html) for AudioCraft.

See [the model card page](./MODEL_CARD.md).

## FAQ

#### Will the training code be released?

Yes. We will soon release the training code for MusicGen and EnCodec.
#### Is the training code available?

Yes! We provide the training code for [EnCodec](./docs/ENCODEC.md), [MusicGen](./docs/MUSICGEN.md) and [Multi Band Diffusion](./docs/MBD.md).

#### I need help on Windows
#### Where are the models stored?

@FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
Hugging Face stored the model in a specific location, which can be overriden by setting the `AUDIOCRAFT_CACHE_DIR` environment variable.

#### I need help for running the demo on Colab

Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo).
## License
* The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
* The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).


## Citation

For the general framework of AudioCraft, please cite the following.
```
@article{copet2023simple,
title={Simple and Controllable Music Generation},
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
year={2023},
journal={arXiv preprint arXiv:2306.05284},
title={Simple and Controllable Music Generation},
author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
year={2023},
journal={arXiv preprint arXiv:2306.05284},
}
```

## License
* The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
* The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).

[arxiv]: https://arxiv.org/abs/2306.05284
[musicgen_samples]: https://ai.honu.io/papers/musicgen/
When referring to a specific model, please cite as mentioned in the model specific README, e.g
[./docs/MUSICGEN.md](./docs/MUSICGEN.md), [./docs/AUDIOGEN.md](./docs/AUDIOGEN.md), etc.
Binary file not shown.
Binary file not shown.
18 changes: 17 additions & 1 deletion audiocraft/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,24 @@
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""
AudioCraft is a general framework for training audio generative models.
At the moment we provide the training code for:
- [MusicGen](https://arxiv.org/abs/2306.05284), a state-of-the-art
text-to-music and melody+text autoregressive generative model.
For the solver, see `audiocraft.solvers.musicgen.MusicGenSolver`, and for the model,
`audiocraft.models.musicgen.MusicGen`.
- [AudioGen](https://arxiv.org/abs/2209.15352), a state-of-the-art
text-to-general-audio generative model.
- [EnCodec](https://arxiv.org/abs/2210.13438), efficient and high fidelity
neural audio codec which provides an excellent tokenizer for autoregressive language models.
See `audiocraft.solvers.compression.CompressionSolver`, and `audiocraft.models.encodec.EncodecModel`.
- [MultiBandDiffusion](TODO), alternative diffusion-based decoder compatible with EnCodec that
improves the perceived quality and reduces the artifacts coming from adversarial decoders.
"""

# flake8: noqa
from . import data, modules, models

__version__ = '0.0.2'
__version__ = '1.0.0'
22 changes: 22 additions & 0 deletions audiocraft/adversarial/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.
"""Adversarial losses and discriminator architectures."""

# flake8: noqa
from .discriminators import (
MultiPeriodDiscriminator,
MultiScaleDiscriminator,
MultiScaleSTFTDiscriminator
)
from .losses import (
AdversarialLoss,
AdvLossType,
get_adv_criterion,
get_fake_criterion,
get_real_criterion,
FeatLossType,
FeatureMatchingLoss
)
10 changes: 10 additions & 0 deletions audiocraft/adversarial/discriminators/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the license found in the
# LICENSE file in the root directory of this source tree.

# flake8: noqa
from .mpd import MultiPeriodDiscriminator
from .msd import MultiScaleDiscriminator
from .msstftd import MultiScaleSTFTDiscriminator
Loading

0 comments on commit bf70522

Please sign in to comment.