Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MusicGen Melody #28819

Merged
merged 72 commits into from
Mar 18, 2024
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
853d2c0
first modeling code
ylacombe Jan 3, 2024
2ff2f3d
make repository
ylacombe Jan 3, 2024
a3fa21f
still WIP
ylacombe Jan 16, 2024
4c02db4
update model
ylacombe Jan 30, 2024
b141703
add tests
ylacombe Feb 1, 2024
2b19612
add latest change
ylacombe Feb 1, 2024
eae18da
clean docstrings and copied from
ylacombe Feb 1, 2024
2285db3
update docstrings md and readme
ylacombe Feb 1, 2024
cb8f4c5
correct chroma function
ylacombe Feb 1, 2024
0ab4623
Merge branch 'main' into add-musicgen-melody
ylacombe Feb 1, 2024
c1e196d
correct copied from and remove unreleated test
ylacombe Feb 1, 2024
c8bf6c5
add doc to toctree
ylacombe Feb 1, 2024
f015753
correct imports
ylacombe Feb 1, 2024
c8c5a4e
add convert script to notdoctested
ylacombe Feb 1, 2024
2cf5cfb
Add suggestion from Sanchit
ylacombe Feb 5, 2024
bce1aaf
Merge branch 'huggingface:main' into add-musicgen-melody
ylacombe Feb 5, 2024
0e944af
correct get_uncoditional_inputs docstrings
ylacombe Feb 5, 2024
1a03cd9
modify README according to SANCHIT feedback
ylacombe Feb 5, 2024
fded84d
add chroma to audio utils
ylacombe Feb 5, 2024
133e486
clean librosa and torchaudio hard dependencies
ylacombe Feb 5, 2024
a70d0da
fix FE
ylacombe Feb 5, 2024
34c8270
refactor audio decoder -> audio encoder for consistency with previous…
ylacombe Feb 5, 2024
fdd1743
refactor conditional -> encoder
ylacombe Feb 6, 2024
b13cbcf
modify sampling rate logics
ylacombe Feb 6, 2024
2bb0adb
modify license at the beginning
ylacombe Feb 6, 2024
d06b327
refactor all_self_attns->all_attentions
ylacombe Feb 6, 2024
7842840
remove ignore copy from causallm generate
ylacombe Feb 6, 2024
8e7c128
add copied from for from_sub_models
ylacombe Feb 6, 2024
8e1bc88
fix make copies
ylacombe Feb 6, 2024
61eb704
add warning if audio is truncated
ylacombe Feb 6, 2024
e761acc
add copied from where relevant
ylacombe Feb 6, 2024
96baf7d
remove artefact
ylacombe Feb 6, 2024
357b416
fix convert script
ylacombe Feb 6, 2024
ebe4cde
fix torchaudio and FE
ylacombe Feb 6, 2024
aacf7ee
modify chroma method according to feedback-> better naming
ylacombe Feb 6, 2024
3838361
refactor input_values->input_features
ylacombe Feb 6, 2024
a68c1a0
refactor input_values->input_features and fix import fe
ylacombe Feb 6, 2024
b174155
add input_features to docstrigs
ylacombe Feb 6, 2024
f9620b9
correct inputs_embeds logics
ylacombe Feb 6, 2024
6b6d7cb
remove dtype conversion
ylacombe Feb 6, 2024
8c1d8f8
refactor _prepare_conditional_hidden_states_kwargs_for_generation ->_…
ylacombe Feb 6, 2024
4eface6
change warning for chroma length
ylacombe Feb 6, 2024
2109479
Update src/transformers/models/musicgen_melody/convert_musicgen_melod…
ylacombe Feb 6, 2024
3bfc793
change way to save wav, using soundfile
ylacombe Feb 6, 2024
9cd463a
correct docs and change to soundfile
ylacombe Feb 6, 2024
9c4aee1
fix import
ylacombe Feb 7, 2024
0fa0274
Merge branch 'huggingface:main' into add-musicgen-melody
ylacombe Feb 7, 2024
0535b57
fix init proj layers
ylacombe Feb 7, 2024
87f4cf7
Merge branch 'huggingface:main' into add-musicgen-melody
ylacombe Feb 7, 2024
b36e802
remove line breaks from md
ylacombe Feb 19, 2024
3fd2839
fix issue with docstrings
ylacombe Feb 19, 2024
9f15d02
add FE suggestions
ylacombe Feb 19, 2024
48c2c3f
improve is in logics and remove useless imports
ylacombe Feb 19, 2024
9a43be0
remove custom from_pretrained
ylacombe Feb 19, 2024
cf89389
simplify docstring code
ylacombe Feb 19, 2024
bb69817
add suggestions for modeling tests
ylacombe Feb 19, 2024
fc33efb
make style
ylacombe Feb 19, 2024
ba4d732
update converting script with sanity check
ylacombe Feb 19, 2024
5166259
remove encoder attention mask from conditional generation
ylacombe Feb 19, 2024
755960a
Merge branch 'main' into add-musicgen-melody
ylacombe Feb 26, 2024
8b9177f
Merge branch 'main' into add-musicgen-melody
ylacombe Mar 4, 2024
ad26dc9
replace musicgen melody checkpoints with official orga
ylacombe Mar 4, 2024
7595256
rename ylacombe->facebook in checkpoints
ylacombe Mar 4, 2024
2576806
fix copies
ylacombe Mar 4, 2024
379d70b
remove unecessary warning
ylacombe Mar 4, 2024
9795c6f
add shape in code docstrings
ylacombe Mar 4, 2024
b03b36d
add files to slow doc tests
ylacombe Mar 4, 2024
b434f8a
fix md bug and add md to not_tested
ylacombe Mar 5, 2024
ebeca43
Merge branch 'main' into add-musicgen-melody
ylacombe Mar 18, 2024
604a4c8
make fix-copies
ylacombe Mar 18, 2024
7bda3c3
Merge branch 'huggingface:main' into add-musicgen-melody
ylacombe Mar 18, 2024
5863cf9
fix hidden states test and batching
ylacombe Mar 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix copies
  • Loading branch information
ylacombe committed Mar 4, 2024
commit 2576806ab235befc8bfa8bae0d49ce9f7dbd5608
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have a seperate file for the generation part like we do for whisper no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could make the modelling + generation code a lot cleaner for the MusicGen series! Although long-term, the issue would be fully resolved by a refactor to generate to make it more composable for audio models (as suggested by @gante)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we do this as a follow-up PR for MusicGen + MusicGen Melody? (so as not to mix two features into one PR)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Original file line number Diff line number Diff line change
Expand Up @@ -1507,26 +1507,20 @@ def from_sub_models_pretrained(
Information necessary to initiate the text encoder. Can be either:

- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like `t5-base`, or namespaced under a user or
organization name, like `google/flan-t5-base.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.

audio_encoder_pretrained_model_name_or_path (`str`, *optional*):
Information necessary to initiate the audio encoder. Can be either:

- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like `bert-base-uncased`, or namespaced under a
user or organization name, like `facebook/encodec_24khz`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.

decoder_pretrained_model_name_or_path (`str`, *optional*, defaults to `None`):
Information necessary to initiate the decoder. Can be either:

- A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
Valid model ids can be located at the root-level, like `gpt2`, or namespaced under a user or
organization name, like `facebook/musicgen-melody`.
- A path to a *directory* containing model weights saved using
[`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.

Expand All @@ -1553,7 +1547,7 @@ def from_sub_models_pretrained(

>>> # initialize a musicgen model from a t5 text encoder, encodec audio encoder, and musicgen decoder
>>> model = MusicgenMelodyForConditionalGeneration.from_sub_models_pretrained(
... text_encoder_pretrained_model_name_or_path="t5-base",
... text_encoder_pretrained_model_name_or_path="google-t5/t5-base",
... audio_encoder_pretrained_model_name_or_path="facebook/encodec_24khz",
... decoder_pretrained_model_name_or_path="facebook/musicgen-melody",
... )
Expand Down
Loading