diff --git a/README.md b/README.md index 604b817..182735e 100644 --- a/README.md +++ b/README.md @@ -78,10 +78,38 @@ poetry run python app.py 3. Use it via [Hugging Face](https://huggingface.co/metavoiceio) 4. [Google Collab Demo](https://colab.research.google.com/github/metavoiceio/metavoice-src/blob/main/colab_demo.ipynb) +## Finetuning +We support finetuning the first stage LLM (see [Architecture section](#Architecture)). + +In order to finetune, we expect a "|"-delimited CSV dataset of the following format: + +```csv +audio_files|captions +./data/audio.wav|./data/caption.txt +``` + +Note that we don't perform any dataset overlap checks, so ensure that your train and val datasets are disjoint. + +Try it out using our sample datasets via: +```bash +poetry run finetune --train ./datasets/sample_dataset.csv --val ./datasets/sample_val_dataset.csv +``` + +### Configuration + +In order to set hyperparameters such as learning rate, what to freeze, etc, you +can edit the [finetune_params.py](./fam/llm/config/finetune_params.py) file. + +We've got a light & optional integration with W&B that can be enabled via setting +`wandb_log = True` & by installing the appropriate dependencies. + +```bash +poetry install -E observable +``` ## Upcoming - [x] Faster inference ⚡ -- [ ] Fine-tuning code +- [x] Fine-tuning code 📐 - [ ] Synthesis of arbitrary length text diff --git a/data/caption.txt b/data/caption.txt new file mode 100644 index 0000000..a2b99bc --- /dev/null +++ b/data/caption.txt @@ -0,0 +1 @@ +Please call Stella. \ No newline at end of file diff --git a/datasets/sample_dataset.csv b/datasets/sample_dataset.csv new file mode 100644 index 0000000..a483222 --- /dev/null +++ b/datasets/sample_dataset.csv @@ -0,0 +1,321 @@ +audio_files|captions +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt diff --git a/datasets/sample_val_dataset.csv b/datasets/sample_val_dataset.csv new file mode 100644 index 0000000..dd63be1 --- /dev/null +++ b/datasets/sample_val_dataset.csv @@ -0,0 +1,81 @@ +audio_files|captions +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt +./data/audio.wav|./data/caption.txt diff --git a/fam/llm/config/__init__.py b/fam/llm/config/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/fam/llm/config/finetune_params.py b/fam/llm/config/finetune_params.py new file mode 100644 index 0000000..a7f705e --- /dev/null +++ b/fam/llm/config/finetune_params.py @@ -0,0 +1,72 @@ +from contextlib import nullcontext +import os +import uuid +import pathlib +from typing import Literal, Optional +import torch + +batch_size = 2 +dataset_size: int = 400 +batched_ds_size = dataset_size // batch_size +val_train_ratio = 0.2 + +epochs: int = 2 +max_iters = batched_ds_size * epochs +learning_rate = 3e-5 +last_n_blocks_to_finetune = 1 +decay_lr = False +lr_decay_iters = 0 # decay learning rate after this many iterations +min_lr = 3e-6 + +eval_interval = batched_ds_size +eval_iters = int(batched_ds_size*val_train_ratio) +eval_only: bool = False # if True, script exits right after the first eval +log_interval = batched_ds_size # don't print too too often +save_interval: int = batched_ds_size * (epochs//2) # save a checkpoint every this many iterations +assert save_interval % eval_interval == 0, "save_interval must be divisible by eval_interval." +seed = 1337 +grad_clip: float = 1.0 # clip gradients at this value, or disable if == 0.0 + +wandb_log = False +wandb_project = "project-name" +wandb_run_name = "run-name" +wandb_tags = ["tag1", "tag2"] + +gradient_accumulation_steps = 1 +block_size = 2_048 +audio_token_mode = "flattened_interleaved" +num_max_audio_tokens_timesteps = 1_024 + +n_layer = 24 +n_head = 16 +n_embd = 2048 +dropout = 0.1 + +weight_decay = 1e-1 +beta1 = 0.9 +beta2 = 0.95 + +warmup_iters: int = 0 # how many steps to warm up for +out_dir = f"finetune-{epochs=}-{learning_rate=}-{batch_size=}-{last_n_blocks_to_finetune=}-{dropout=}-{uuid.uuid4()}" + +compile = True +num_codebooks = None +norm_type = "rmsnorm" +rmsnorm_eps = 1e-5 +nonlinearity_type = "swiglu" +swiglu_multiple_of = 256 +attn_kernel_type = "torch_attn" +meta_target_vocab_sizes: Optional[list[int]] = None +speaker_emb_size: int = 256 +speaker_cond = True + +# always running finetuning on a single GPU +master_process = True +device: str = "cuda" # examples: 'cpu', 'cuda', 'cuda:0', 'cuda:1' etc., or try 'mps' on macbooks +ddp = False +ddp_world_size = 1 +tokens_per_iter = gradient_accumulation_steps * ddp_world_size * batch_size * block_size + +causal = True +bias: bool = False # do we use bias inside LayerNorm and Linear layers? +spk_emb_on_text: bool = True # whether to add speaker embedding conditioning to text tokens or not diff --git a/fam/llm/fast_inference_utils.py b/fam/llm/fast_inference_utils.py index 6ed7b9e..97c2d95 100644 --- a/fam/llm/fast_inference_utils.py +++ b/fam/llm/fast_inference_utils.py @@ -225,8 +225,8 @@ def generate( return seq -def encode_tokens(tokenizer, string, device="cuda"): - tokens = tokenizer.encode(string) +def encode_tokens(tokenizer: TrainedBPETokeniser, text: str, device="cuda") -> torch.Tensor: + tokens = tokenizer.encode(text) return torch.tensor(tokens, dtype=torch.int, device=device) @@ -301,7 +301,6 @@ def _load_model(checkpoint_path, spk_emb_ckpt_path, device, precision): tokenizer = TrainedBPETokeniser(**tokenizer_info) ###### SPEAKER EMBEDDER - # TODO: fix! smodel = SpeakerEncoder( weights_fpath=spk_emb_ckpt_path, device=device, diff --git a/fam/llm/finetune.py b/fam/llm/finetune.py new file mode 100644 index 0000000..f5f8a37 --- /dev/null +++ b/fam/llm/finetune.py @@ -0,0 +1,357 @@ +""" +Module responsible for finetuning the first stage LLM. +""" + +import itertools +import math +from pathlib import Path +import time +from typing import Any, Dict, Optional + +import click +import torch +from huggingface_hub import snapshot_download +from torch.utils.data import DataLoader +from tqdm import tqdm + +from fam.llm.config.finetune_params import * +from fam.llm.loaders.training_data import DynamicComputeDataset +from fam.llm.model import GPT, GPTConfig +from fam.llm.preprocessing.audio_token_mode import get_params_for_mode +from fam.llm.preprocessing.data_pipeline import get_training_tuple + + +dtype: Literal["bfloat16", "float16", "tfloat32", "float32"] = ( + "bfloat16" if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else "float16" +) # 'float32', 'bfloat16', or 'float16', the latter will auto implement a GradScaler +seed_offset = 0 + +torch.manual_seed(seed + seed_offset) +torch.backends.cuda.matmul.allow_tf32 = True if dtype != "float32" else False +torch.backends.cudnn.allow_tf32 = True if dtype != "float32" else False +device_type = "cuda" if "cuda" in device else "cpu" # for later use in torch.autocast +# note: float16 data type will automatically use a GradScaler +ptdtype = {"float32": torch.float32, "tfloat32": torch.float32, "bfloat16": torch.bfloat16, "float16": torch.float16}[ + dtype +] +ctx = nullcontext() if device_type == "cpu" else torch.amp.autocast(device_type=device_type, dtype=ptdtype) + +print(f"tokens per iteration will be: {tokens_per_iter:,}") + +ckpts_base_dir = pathlib.Path(__file__).resolve().parent / "ckpts" +if not os.path.exists(ckpts_base_dir) and master_process: + raise Exception(f"ckpts dir {ckpts_base_dir} does not exist!") + +if master_process: + if "/" in out_dir: + raise Exception("out_dir should be just a name, not a path with slashes") + + ckpts_save_dir = ckpts_base_dir / out_dir + os.makedirs(ckpts_save_dir, exist_ok=True) + +def get_globals_state(): + """ Return entirety of configuration global state which can be used for logging. """ + config_keys = [k for k, v in globals().items() if not k.startswith("_") and isinstance(v, (int, float, bool, str))] + return {k: globals()[k] for k in config_keys} # will be useful for logging + +model_args: dict = dict( + n_layer=n_layer, + n_head=n_head, + n_embd=n_embd, + block_size=block_size, + bias=bias, + vocab_sizes=None, + dropout=dropout, + causal=causal, + norm_type=norm_type, + rmsnorm_eps=rmsnorm_eps, + nonlinearity_type=nonlinearity_type, + spk_emb_on_text=spk_emb_on_text, + attn_kernel_type=attn_kernel_type, + swiglu_multiple_of=swiglu_multiple_of, +) # start with model_args from command line + +def strip_prefix(state_dict: Dict[str, Any], unwanted_prefix: str): + # TODO: this also appears in fast_inference_utils._load_model, it should be moved to a common place. + for k, v in list(state_dict.items()): + if k.startswith(unwanted_prefix): + state_dict[k[len(unwanted_prefix) :]] = state_dict.pop(k) + return state_dict + + +def force_ckpt_args(model_args, checkpoint_model_args) -> None: + # force these config attributes to be equal otherwise we can't even resume training + # the rest of the attributes (e.g. dropout) can stay as desired from command line + for k in ["n_layer", "n_head", "n_embd", "block_size", "bias", "vocab_sizes", "causal"]: + model_args[k] = checkpoint_model_args[k] + # this enables backward compatability with previously saved checkpoints. + for k in [ + "target_vocab_sizes", + "norm_type", + "rmsnorm_eps", + "nonlinearity_type", + "attn_kernel_type", + "spk_emb_on_text", + "swiglu_multiple_of", + ]: + if k in checkpoint_model_args: + model_args[k] = checkpoint_model_args[k] + if attn_kernel_type != model_args["attn_kernel_type"]: + print( + f'Found {model_args["attn_kernel_type"]} kernel type inside model,', + f"but expected {attn_kernel_type}. Manually replacing it.", + ) + model_args["attn_kernel_type"] = attn_kernel_type + + +@click.command() +@click.option("--train", type=click.Path(exists=True, path_type=Path), required=True) +@click.option("--val", type=click.Path(exists=True, path_type=Path), required=True) +@click.option("--model-id", type=str, required=False, default="metavoiceio/metavoice-1B-v0.1") +@click.option("--ckpt", type=click.Path(exists=True, path_type=Path)) +@click.option("--spk-emb-ckpt", type=click.Path(exists=True, path_type=Path)) +def main(train: Path, val: Path, model_id: str, ckpt: Optional[Path], spk_emb_ckpt: Optional[Path]): + if ckpt and spk_emb_ckpt: + checkpoint_path, spk_emb_ckpt_path = ckpt, spk_emb_ckpt + else: + _model_dir = snapshot_download(repo_id=model_id) + checkpoint_path = Path(f"{_model_dir}/first_stage.pt") + spk_emb_ckpt_path = Path(f"{_model_dir}/speaker_encoder.pt") + + mode_params = get_params_for_mode(audio_token_mode, num_max_audio_tokens_timesteps=num_max_audio_tokens_timesteps) + config = get_globals_state() + + checkpoint = torch.load(str(checkpoint_path), mmap=True, map_location=device) + iter_num = checkpoint.get("iter_num", 0) + best_val_loss = checkpoint.get("best_val_loss", 1e9) + checkpoint_model_args = checkpoint["model_args"] + tokenizer_info = checkpoint.get("meta", {}).get("tokenizer", {}) + force_ckpt_args(model_args, checkpoint_model_args) + gptconf = GPTConfig(**model_args) # type: ignore + model = GPT(gptconf, speaker_emb_dim=speaker_emb_size if speaker_cond else None) + + # removing torch.compile module prefixes for pre-compile loading + state_dict = strip_prefix(checkpoint["model"], "_orig_mod.") + model.load_state_dict(state_dict) + model.to(device) + # initialize a GradScaler. If enabled=False scaler is a no-op + scaler = torch.cuda.amp.GradScaler(enabled=(dtype == "float16")) + optimizer = model.configure_optimizers(weight_decay, learning_rate, (beta1, beta2), device_type) + if compile: + print("Compiling the model... (takes a ~minute)") + # requires PyTorch 2.0 + from einops._torch_specific import allow_ops_in_compiled_graph + + allow_ops_in_compiled_graph() + model = torch.compile(model) # type: ignore + + def estimate_loss(dataset, iters: int=eval_iters): + """ Estimate loss on a dataset by running on `iters` batches. """ + if dataset is None: + return torch.nan + losses = [] + for _, batch in zip(tqdm(range(iters)), dataset): + X, Y, SE = get_training_tuple( + batch, + causal, + num_codebooks, + speaker_cond, + device + ) + with ctx: + _, loss = model(X, Y, speaker_embs=SE, speaker_emb_mask=None) + losses.append(loss.item()) + return torch.tensor(losses).mean() + + # learning rate decay scheduler (cosine with warmup) + def get_lr(it): + # 1) linear warmup for warmup_iters steps + if it < warmup_iters: + return learning_rate * it / warmup_iters + # 2) if it > lr_decay_iters, return min learning rate + if it > lr_decay_iters: + return min_lr + # 3) in between, use cosine decay down to min learning rate + decay_ratio = (it - warmup_iters) / (lr_decay_iters - warmup_iters) + assert 0 <= decay_ratio <= 1 + coeff = 0.5 * (1.0 + math.cos(math.pi * decay_ratio)) # coeff ranges 0..1 + return min_lr + coeff * (learning_rate - min_lr) + + if wandb_log and master_process: + import wandb + + if os.environ.get("WANDB_RUN_ID", None) is not None: + resume = "must" + else: + resume = None + + wandb.init(project=wandb_project, name=wandb_run_name, tags=wandb_tags, config=config, resume=resume) + + train_dataset = DynamicComputeDataset.from_meta( + tokenizer_info, + mode_params["combine_func"], + spk_emb_ckpt_path, + train, + mode_params["pad_token"], + mode_params["ctx_window"], + device, + ) + val_dataset = DynamicComputeDataset.from_meta( + tokenizer_info, + mode_params["combine_func"], + spk_emb_ckpt_path, + val, + mode_params["pad_token"], + mode_params["ctx_window"], + device, + ) + train_dataloader = itertools.cycle( + DataLoader(train_dataset, batch_size, shuffle=True) + ) + train_data = iter(train_dataloader) + # we do not perform any explicit checks for dataset overlap & leave it to the user + # to handle this + eval_val_data = DataLoader(val_dataset, batch_size, shuffle=True) + # we can use the same Dataset object given it is a mapped dataset & not an iterable + # one that can be exhausted. This implies we will be needlessly recomputing, fine + # for now. + eval_train_data = DataLoader(train_dataset, batch_size, shuffle=True) + + batch = next(train_data) + X, Y, SE = get_training_tuple( + batch, + causal, + num_codebooks, + speaker_cond, + device + ) + + t0 = time.time() + local_iter_num = 0 # number of iterations in the lifetime of this process + raw_model = model.module if ddp else model # unwrap DDP container if needed + running_mfu = -1.0 + total_norm = 0.0 + save_checkpoint = False + if master_process: + progress = tqdm(total=max_iters, desc="Training", initial=iter_num) + else: + progress = None + + # finetune last X transformer blocks and the ln_f layer + trainable_count = lambda model: sum(p.numel() for p in model.parameters() if p.requires_grad) + print(f"Before layer freezing {trainable_count(model)=}...") + for param in model.parameters(): + param.requires_grad = False + for param in itertools.chain( + model.transformer.ln_f.parameters(), model.transformer.h[last_n_blocks_to_finetune*-1:].parameters() + ): + param.requires_grad = True + print(f"After freezing excl. last {last_n_blocks_to_finetune} transformer blocks: {trainable_count(model)=}...") + + while True: + lr = get_lr(iter_num) if decay_lr else learning_rate + for param_group in optimizer.param_groups: + param_group["lr"] = lr + if master_process: + if iter_num % eval_interval == 0 and master_process: + ckpt_save_name = f"ckpt_{iter_num:07d}.pt" + with torch.no_grad(): + model.eval() + losses = { + "train": estimate_loss(eval_train_data), + "val": estimate_loss(eval_val_data), + } + model.train() + print(f"step {iter_num}: train loss {losses['train']:.4f}, val loss {losses['val']:.4f}") + if wandb_log: + wandb.log( + { + "iter": iter_num, + "train/loss": losses["train"], + "val/loss": losses["val"], + "lr": lr, + "mfu": running_mfu * 100, # convert to percentage + "stats/total_norm": total_norm, + } + ) + if losses["val"] < best_val_loss: + best_val_loss = losses["val"] + if iter_num > 0: + ckpt_save_name = ckpt_save_name.replace(".pt", f"_bestval_{best_val_loss}".replace(".", "_") + ".pt") + save_checkpoint = True + + save_checkpoint = save_checkpoint or iter_num % save_interval == 0 + if save_checkpoint and iter_num > 0: + checkpoint = { + "model": raw_model.state_dict(), # type: ignore + "optimizer": optimizer.state_dict(), + "model_args": model_args, + "iter_num": iter_num, + "best_val_loss": best_val_loss, + "config": config, + "meta": { + "speaker_cond": speaker_cond, + "speaker_emb_size": speaker_emb_size, + "tokenizer": tokenizer_info, + }, + } + torch.save(checkpoint, os.path.join(ckpts_save_dir, ckpt_save_name)) + print(f"saving checkpoint to {ckpts_save_dir}") + save_checkpoint = False + if iter_num == 0 and eval_only: + break + # forward backward update, with optional gradient accumulation to simulate larger batch size + # and using the GradScaler if data type is float16 + for micro_step in range(gradient_accumulation_steps): + if ddp: + # in DDP training we only need to sync gradients at the last micro step. + # the official way to do this is with model.no_sync() context manager, but + # I really dislike that this bloats the code and forces us to repeat code + # looking at the source of that context manager, it just toggles this variable + model.require_backward_grad_sync = micro_step == gradient_accumulation_steps - 1 # type: ignore + with ctx: # type: ignore + logits, loss = model(X, Y, speaker_embs=SE, speaker_emb_mask=None) + loss = loss / gradient_accumulation_steps # scale the loss to account for gradient accumulation + # immediately async prefetch next batch while model is doing the forward pass on the GPU + batch = next(train_data) + X, Y, SE = get_training_tuple( + batch, + causal, + num_codebooks, + speaker_cond, + device, + ) + # backward pass, with gradient scaling if training in fp16 + scaler.scale(loss).backward() + # clip the gradient + if grad_clip != 0.0: + scaler.unscale_(optimizer) + total_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip) + # step the optimizer and scaler if training in fp16 + scaler.step(optimizer) + scaler.update() + # flush the gradients as soon as we can, no need for this memory anymore + optimizer.zero_grad(set_to_none=True) + + # timing and logging + t1 = time.time() + dt = t1 - t0 + t0 = t1 + if master_process: + # get loss as float. note: this is a CPU-GPU sync point + # scale up to undo the division above, approximating the true total loss (exact would have been a sum) + lossf = loss.item() * gradient_accumulation_steps + progress.update(1) + progress.set_description(f"Training: loss {lossf:.4f}, time {dt*1000:.2f}ms") + if iter_num % log_interval == 0: + print(f"iter {iter_num}: loss {lossf:.4f}, time {dt*1000:.2f}ms") + + iter_num += 1 + local_iter_num += 1 + + # termination conditions + if iter_num > max_iters: + break + +if __name__ == "__main__": + main() diff --git a/fam/llm/inference.py b/fam/llm/inference.py index 57f44da..975f5d9 100644 --- a/fam/llm/inference.py +++ b/fam/llm/inference.py @@ -647,7 +647,7 @@ class SamplingControllerConfig: """Guidance scale for sampling: (speaker conditioning guidance_scale, prompt conditioning guidance scale).""" batch_size: int = 128 - """Batch size to use for sampling. Note that the batch size gets doubled when guidance is used. For H100, and 1B model, + """Batch size to use for sampling. Note that the batch size gets doubled when guidance is used. For H100, and 1B model, 1 w/ guidance and 1 w/o guidance work well (without kv-caching). With kv-caching, 128 (w/o guidance) and 64 (w/ guidance) works well.""" diff --git a/fam/llm/loaders/__init__.py b/fam/llm/loaders/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/fam/llm/loaders/training_data.py b/fam/llm/loaders/training_data.py new file mode 100644 index 0000000..7b4f84e --- /dev/null +++ b/fam/llm/loaders/training_data.py @@ -0,0 +1,116 @@ +from pathlib import Path +from typing import Any, Mapping + +import julius +import torch +import math +import numpy as np +import pandas as pd +from audiocraft.data.audio import audio_read +from encodec import EncodecModel +from torch.utils.data import DataLoader, Dataset + +from fam.llm.fast_inference_utils import encode_tokens +from fam.llm.preprocessing.audio_token_mode import CombinerFuncT, CombinerFuncT +from fam.llm.preprocessing.data_pipeline import pad_tokens +from fam.llm.utils import normalize_text +from fam.quantiser.audio.speaker_encoder.model import SpeakerEncoder +from fam.quantiser.text.tokenise import TrainedBPETokeniser + +MBD_SAMPLE_RATE = 24000 +ENCODEC_BANDWIDTH = 6 + + +class DynamicComputeDataset(Dataset): + def __init__( + self, + dataset_dir: Path | str, + encodec_model: EncodecModel, + tokenizer: TrainedBPETokeniser, + spkemb_model: SpeakerEncoder, + combiner: CombinerFuncT, + pad_token: int, + ctx_window: int, + device: str, + ): + self.dataset_dir = dataset_dir + self.encodec_model = encodec_model + self.tokenizer = tokenizer + self.spkemb_model = spkemb_model + self.device = device + self.combiner = combiner + self.pad_token = pad_token + self.ctx_window = ctx_window + self.df = pd.read_csv(dataset_dir, delimiter="|", index_col=False) + + @classmethod + def from_meta( + cls, + tokenizer_info: Mapping[str, Any], + combiner: CombinerFuncT, + speaker_embedding_ckpt_path: Path | str, + dataset_dir: Path | str, + pad_token: int, + ctx_window: int, + device: str + ): + encodec = EncodecModel.encodec_model_24khz().to(device) + encodec.set_target_bandwidth(ENCODEC_BANDWIDTH) + smodel = SpeakerEncoder( + weights_fpath=str(speaker_embedding_ckpt_path), + eval=True, + device=device, + verbose=False, + ) + tokeniser = TrainedBPETokeniser(**tokenizer_info) + + return cls( + dataset_dir, + encodec, + tokeniser, + smodel, + combiner, + pad_token, + ctx_window, + device + ) + + def __len__(self): + return len(self.df) + + def __getitem__(self, idx): + audio_path, text = self.df.iloc[idx].values.tolist() + with torch.no_grad(): + text_tokens = self._extract_text_tokens(text) + encodec_tokens = self._extract_encodec_tokens(audio_path) + speaker_embedding = self._extract_speaker_embedding(audio_path) + combined = self.combiner(encodec_tokens, text_tokens) + padded_combined_tokens = pad_tokens(combined, self.ctx_window, self.pad_token) + + return {"tokens": padded_combined_tokens, "spkemb": speaker_embedding} + + def _extract_text_tokens(self, text: str): + _text = normalize_text(text) + _tokens = encode_tokens(self.tokenizer, _text, self.device) + + return _tokens.detach().cpu().numpy() + + def _extract_encodec_tokens(self, audio_path: str): + wav, sr = audio_read(audio_path) + if sr != MBD_SAMPLE_RATE: + wav = julius.resample_frac(wav, sr, MBD_SAMPLE_RATE) + + # Convert to mono and fix dimensionality + if wav.ndim == 2: + wav = wav.mean(axis=0, keepdims=True) + wav = wav.unsqueeze(0) # Add batch dimension + + wav = wav.to(self.device) + tokens = self.encodec_model.encode(wav) + _tokens = tokens[0][0][0].detach().cpu().numpy() # shape = [8, T] + + return _tokens + + def _extract_speaker_embedding(self, audio_path: str): + emb = self.spkemb_model.embed_utterance_from_file(audio_path, numpy=False) # shape = [256,] + return emb.unsqueeze(0).detach() diff --git a/fam/llm/preprocessing/__init__.py b/fam/llm/preprocessing/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/fam/llm/preprocessing/audio_token_mode.py b/fam/llm/preprocessing/audio_token_mode.py new file mode 100644 index 0000000..dcf940d --- /dev/null +++ b/fam/llm/preprocessing/audio_token_mode.py @@ -0,0 +1,51 @@ +from functools import partial +from typing import Any, Callable, Literal, Optional + +import numpy as np + +AudioTokenModeT = Literal["flattened_interleaved"] +CombinerWithOffsetFuncT = Callable[[np.ndarray, np.ndarray, int], np.ndarray] +CombinerFuncT = Callable[[np.ndarray, np.ndarray], np.ndarray] + + +def combine_tokens_flattened_interleaved( + audio_tokens: np.ndarray, text_tokens: np.ndarray, second_hierarchy_flattening_offset: int +) -> np.ndarray: + """ + Flattens & interleaves first 2 of the audio token hierarchies. Note that the tokens for the second hierarchy + are also offset by second_hierarchy_flattening_offset as part of this transform to avoid conflict with values for the + first hierarchy. + """ + assert np.issubdtype(audio_tokens.dtype, np.integer) + assert np.issubdtype(text_tokens.dtype, np.integer) + + num_hierarchies = audio_tokens.shape[0] + assert num_hierarchies >= 2, f"Unexpected number of hierarchies: {num_hierarchies}. Expected at least 2." + + # choosing -5 so that we can't get error! + interleaved_audio_tokens = np.full((len(audio_tokens[0]) + len(audio_tokens[1]),), -5) + interleaved_audio_tokens[::2] = audio_tokens[0] + interleaved_audio_tokens[1::2] = audio_tokens[1] + second_hierarchy_flattening_offset + + tokens = np.concatenate([text_tokens, interleaved_audio_tokens]) + + return np.expand_dims(tokens, axis=0) + + +def get_params_for_mode( + audio_token_mode: AudioTokenModeT, num_max_audio_tokens_timesteps: Optional[int] = None +) -> dict[str, Any]: + if audio_token_mode == "flattened_interleaved": + return { + "text_tokenisation_offset": 1024 * 2 + 1, + "pad_token": 1024 * 2, + "ctx_window": num_max_audio_tokens_timesteps * 2 if num_max_audio_tokens_timesteps else None, + "second_hierarchy_flattening_offset": 1024, + # TODO: fix the repeat of `second_hierarchy_flattening_offset` + "combine_func": partial( + combine_tokens_flattened_interleaved, + second_hierarchy_flattening_offset=1024, + ), + } + else: + raise Exception(f"Unknown mode {audio_token_mode}") diff --git a/fam/llm/preprocessing/data_pipeline.py b/fam/llm/preprocessing/data_pipeline.py new file mode 100644 index 0000000..a6e8fcf --- /dev/null +++ b/fam/llm/preprocessing/data_pipeline.py @@ -0,0 +1,58 @@ +from typing import Any, Dict, Optional, Tuple + +import torch +import numpy as np + + +def pad_tokens(tokens: np.ndarray, context_window: int, pad_token: int) -> np.ndarray: + """Pads or truncates a single example to the context_window + 1 size. + + tokens: (..., example_length) + """ + example_length = tokens.shape[-1] + if example_length > context_window + 1: + # Truncate + tokens = tokens[..., : context_window + 1] + elif example_length < context_window + 1: + # Pad + padding = np.full(tokens.shape[:-1] + (context_window + 1 - example_length,), pad_token) + tokens = np.concatenate([tokens, padding], axis=-1) + assert tokens.shape[-1] == context_window + 1 + return tokens + + +def get_training_tuple( + batch: Dict[str, Any], + causal: bool, + num_codebooks: Optional[int], + speaker_cond: bool, + device: torch.device, +) -> Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]: + # batch contains combined tokens as specified by audio_token_mode + if causal: + num_codebooks = batch["tokens"].shape[1] if num_codebooks is None else num_codebooks + x = batch["tokens"][:, :num_codebooks, :-1] + y = batch["tokens"][:, :num_codebooks, 1:] + + se = batch["spkemb"] + + x = x.to(device, non_blocking=True) + y = y.to(device, non_blocking=True) + se = se.to(device, non_blocking=True) if speaker_cond else None + + return x, y, se + + +def pad_with_values(tensor, batch_size, value): + """Pads the tensor up to batch_size with values.""" + if tensor.shape[0] < batch_size: + return torch.cat( + [ + tensor, + torch.full( + (batch_size - tensor.shape[0], *tensor.shape[1:]), value, dtype=tensor.dtype, device=tensor.device + ), + ] + ) + else: + return tensor diff --git a/poetry.lock b/poetry.lock index 82e072c..d6bcd8a 100644 --- a/poetry.lock +++ b/poetry.lock @@ -746,6 +746,20 @@ tqdm = "*" [package.extras] dev = ["diffq (>=0.2.1)", "dora-search (>=0.1.12)", "einops", "flake8", "hydra-colorlog (>=1.1)", "hydra-core (>=1.1)", "julius (>=0.2.3)", "lameenc (>=1.2)", "museval", "mypy", "openunmix", "pyyaml", "soundfile (>=0.10.3)", "submitit", "torch (>=1.8.1)", "torchaudio (>=0.8)", "tqdm", "treetable"] +[[package]] +name = "docker-pycreds" +version = "0.4.0" +description = "Python bindings for the docker credentials store API" +optional = true +python-versions = "*" +files = [ + {file = "docker-pycreds-0.4.0.tar.gz", hash = "sha256:6ce3270bcaf404cc4c3e27e4b6c70d3521deae82fb508767870fdbf772d584d4"}, + {file = "docker_pycreds-0.4.0-py2.py3-none-any.whl", hash = "sha256:7266112468627868005106ec19cd0d722702d2b7d5912a28e19b826c3d37af49"}, +] + +[package.dependencies] +six = ">=1.4.0" + [[package]] name = "docopt" version = "0.6.2" @@ -995,6 +1009,37 @@ smb = ["smbprotocol"] ssh = ["paramiko"] tqdm = ["tqdm"] +[[package]] +name = "gitdb" +version = "4.0.11" +description = "Git Object Database" +optional = true +python-versions = ">=3.7" +files = [ + {file = "gitdb-4.0.11-py3-none-any.whl", hash = "sha256:81a3407ddd2ee8df444cbacea00e2d038e40150acfa3001696fe0dcf1d3adfa4"}, + {file = "gitdb-4.0.11.tar.gz", hash = "sha256:bf5421126136d6d0af55bc1e7c1af1c397a34f5b7bd79e776cd3e89785c2b04b"}, +] + +[package.dependencies] +smmap = ">=3.0.1,<6" + +[[package]] +name = "gitpython" +version = "3.1.42" +description = "GitPython is a Python library used to interact with Git repositories" +optional = true +python-versions = ">=3.7" +files = [ + {file = "GitPython-3.1.42-py3-none-any.whl", hash = "sha256:1bf9cd7c9e7255f77778ea54359e54ac22a72a5b51288c457c881057b7bb9ecd"}, + {file = "GitPython-3.1.42.tar.gz", hash = "sha256:2d99869e0fef71a73cbd242528105af1d6c1b108c60dfabd994bf292f76c3ceb"}, +] + +[package.dependencies] +gitdb = ">=4.0.1,<5" + +[package.extras] +test = ["black", "coverage[toml]", "ddt (>=1.1.1,!=1.4.3)", "mock", "mypy", "pre-commit", "pytest (>=7.3.1)", "pytest-cov", "pytest-instafail", "pytest-mock", "pytest-sugar"] + [[package]] name = "gradio" version = "4.20.1" @@ -2410,6 +2455,34 @@ files = [ {file = "protobuf-4.25.3.tar.gz", hash = "sha256:25b5d0b42fd000320bd7830b349e3b696435f3b329810427a6bcce6a5492cc5c"}, ] +[[package]] +name = "psutil" +version = "5.9.8" +description = "Cross-platform lib for process and system monitoring in Python." +optional = true +python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*" +files = [ + {file = "psutil-5.9.8-cp27-cp27m-macosx_10_9_x86_64.whl", hash = "sha256:26bd09967ae00920df88e0352a91cff1a78f8d69b3ecabbfe733610c0af486c8"}, + {file = "psutil-5.9.8-cp27-cp27m-manylinux2010_i686.whl", hash = "sha256:05806de88103b25903dff19bb6692bd2e714ccf9e668d050d144012055cbca73"}, + {file = "psutil-5.9.8-cp27-cp27m-manylinux2010_x86_64.whl", hash = "sha256:611052c4bc70432ec770d5d54f64206aa7203a101ec273a0cd82418c86503bb7"}, + {file = "psutil-5.9.8-cp27-cp27mu-manylinux2010_i686.whl", hash = "sha256:50187900d73c1381ba1454cf40308c2bf6f34268518b3f36a9b663ca87e65e36"}, + {file = "psutil-5.9.8-cp27-cp27mu-manylinux2010_x86_64.whl", hash = "sha256:02615ed8c5ea222323408ceba16c60e99c3f91639b07da6373fb7e6539abc56d"}, + {file = "psutil-5.9.8-cp27-none-win32.whl", hash = "sha256:36f435891adb138ed3c9e58c6af3e2e6ca9ac2f365efe1f9cfef2794e6c93b4e"}, + {file = "psutil-5.9.8-cp27-none-win_amd64.whl", hash = "sha256:bd1184ceb3f87651a67b2708d4c3338e9b10c5df903f2e3776b62303b26cb631"}, + {file = "psutil-5.9.8-cp36-abi3-macosx_10_9_x86_64.whl", hash = "sha256:aee678c8720623dc456fa20659af736241f575d79429a0e5e9cf88ae0605cc81"}, + {file = "psutil-5.9.8-cp36-abi3-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8cb6403ce6d8e047495a701dc7c5bd788add903f8986d523e3e20b98b733e421"}, + {file = "psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d06016f7f8625a1825ba3732081d77c94589dca78b7a3fc072194851e88461a4"}, + {file = "psutil-5.9.8-cp36-cp36m-win32.whl", hash = "sha256:7d79560ad97af658a0f6adfef8b834b53f64746d45b403f225b85c5c2c140eee"}, + {file = "psutil-5.9.8-cp36-cp36m-win_amd64.whl", hash = "sha256:27cc40c3493bb10de1be4b3f07cae4c010ce715290a5be22b98493509c6299e2"}, + {file = "psutil-5.9.8-cp37-abi3-win32.whl", hash = "sha256:bc56c2a1b0d15aa3eaa5a60c9f3f8e3e565303b465dbf57a1b730e7a2b9844e0"}, + {file = "psutil-5.9.8-cp37-abi3-win_amd64.whl", hash = "sha256:8db4c1b57507eef143a15a6884ca10f7c73876cdf5d51e713151c1236a0e68cf"}, + {file = "psutil-5.9.8-cp38-abi3-macosx_11_0_arm64.whl", hash = "sha256:d16bbddf0693323b8c6123dd804100241da461e41d6e332fb0ba6058f630f8c8"}, + {file = "psutil-5.9.8.tar.gz", hash = "sha256:6be126e3225486dff286a8fb9a06246a5253f4c7c53b475ea5f5ac934e64194c"}, +] + +[package.extras] +test = ["enum34", "ipaddress", "mock", "pywin32", "wmi"] + [[package]] name = "pycparser" version = "2.21" @@ -2573,13 +2646,13 @@ diagrams = ["jinja2", "railroad-diagrams"] [[package]] name = "pytest" -version = "8.1.1" +version = "8.0.2" description = "pytest: simple powerful testing with Python" optional = false python-versions = ">=3.8" files = [ - {file = "pytest-8.1.1-py3-none-any.whl", hash = "sha256:2a8386cfc11fa9d2c50ee7b2a57e7d898ef90470a7a34c4b949ff59662bb78b7"}, - {file = "pytest-8.1.1.tar.gz", hash = "sha256:ac978141a75948948817d360297b7aae0fcb9d6ff6bc9ec6d514b85d5a65c044"}, + {file = "pytest-8.0.2-py3-none-any.whl", hash = "sha256:edfaaef32ce5172d5466b5127b42e0d6d35ebbe4453f0e3505d96afd93f6b096"}, + {file = "pytest-8.0.2.tar.gz", hash = "sha256:d4051d623a2e0b7e51960ba963193b09ce6daeb9759a451844a21e4ddedfc1bd"}, ] [package.dependencies] @@ -2587,11 +2660,11 @@ colorama = {version = "*", markers = "sys_platform == \"win32\""} exceptiongroup = {version = ">=1.0.0rc8", markers = "python_version < \"3.11\""} iniconfig = "*" packaging = "*" -pluggy = ">=1.4,<2.0" -tomli = {version = ">=1", markers = "python_version < \"3.11\""} +pluggy = ">=1.3.0,<2.0" +tomli = {version = ">=1.0.0", markers = "python_version < \"3.11\""} [package.extras] -testing = ["argcomplete", "attrs (>=19.2)", "hypothesis (>=3.56)", "mock", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"] +testing = ["argcomplete", "attrs (>=19.2.0)", "hypothesis (>=3.56)", "mock", "nose", "pygments (>=2.7.2)", "requests", "setuptools", "xmlschema"] [[package]] name = "python-dateutil" @@ -3289,6 +3362,151 @@ files = [ {file = "sentencepiece-0.2.0.tar.gz", hash = "sha256:a52c19171daaf2e697dc6cbe67684e0fa341b1248966f6aebb541de654d15843"}, ] +[[package]] +name = "sentry-sdk" +version = "1.41.0" +description = "Python client for Sentry (https://sentry.io)" +optional = true +python-versions = "*" +files = [ + {file = "sentry-sdk-1.41.0.tar.gz", hash = "sha256:4f2d6c43c07925d8cd10dfbd0970ea7cb784f70e79523cca9dbcd72df38e5a46"}, + {file = "sentry_sdk-1.41.0-py2.py3-none-any.whl", hash = "sha256:be4f8f4b29a80b6a3b71f0f31487beb9e296391da20af8504498a328befed53f"}, +] + +[package.dependencies] +certifi = "*" +urllib3 = {version = ">=1.26.11", markers = "python_version >= \"3.6\""} + +[package.extras] +aiohttp = ["aiohttp (>=3.5)"] +arq = ["arq (>=0.23)"] +asyncpg = ["asyncpg (>=0.23)"] +beam = ["apache-beam (>=2.12)"] +bottle = ["bottle (>=0.12.13)"] +celery = ["celery (>=3)"] +chalice = ["chalice (>=1.16.0)"] +clickhouse-driver = ["clickhouse-driver (>=0.2.0)"] +django = ["django (>=1.8)"] +falcon = ["falcon (>=1.4)"] +fastapi = ["fastapi (>=0.79.0)"] +flask = ["blinker (>=1.1)", "flask (>=0.11)", "markupsafe"] +grpcio = ["grpcio (>=1.21.1)"] +httpx = ["httpx (>=0.16.0)"] +huey = ["huey (>=2)"] +loguru = ["loguru (>=0.5)"] +opentelemetry = ["opentelemetry-distro (>=0.35b0)"] +opentelemetry-experimental = ["opentelemetry-distro (>=0.40b0,<1.0)", "opentelemetry-instrumentation-aiohttp-client (>=0.40b0,<1.0)", "opentelemetry-instrumentation-django (>=0.40b0,<1.0)", "opentelemetry-instrumentation-fastapi (>=0.40b0,<1.0)", "opentelemetry-instrumentation-flask (>=0.40b0,<1.0)", "opentelemetry-instrumentation-requests (>=0.40b0,<1.0)", "opentelemetry-instrumentation-sqlite3 (>=0.40b0,<1.0)", "opentelemetry-instrumentation-urllib (>=0.40b0,<1.0)"] +pure-eval = ["asttokens", "executing", "pure-eval"] +pymongo = ["pymongo (>=3.1)"] +pyspark = ["pyspark (>=2.4.4)"] +quart = ["blinker (>=1.1)", "quart (>=0.16.1)"] +rq = ["rq (>=0.6)"] +sanic = ["sanic (>=0.8)"] +sqlalchemy = ["sqlalchemy (>=1.2)"] +starlette = ["starlette (>=0.19.1)"] +starlite = ["starlite (>=1.48)"] +tornado = ["tornado (>=5)"] + +[[package]] +name = "setproctitle" +version = "1.3.3" +description = "A Python module to customize the process title" +optional = true +python-versions = ">=3.7" +files = [ + {file = "setproctitle-1.3.3-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:897a73208da48db41e687225f355ce993167079eda1260ba5e13c4e53be7f754"}, + {file = "setproctitle-1.3.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:8c331e91a14ba4076f88c29c777ad6b58639530ed5b24b5564b5ed2fd7a95452"}, + {file = "setproctitle-1.3.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:bbbd6c7de0771c84b4aa30e70b409565eb1fc13627a723ca6be774ed6b9d9fa3"}, + {file = "setproctitle-1.3.3-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:c05ac48ef16ee013b8a326c63e4610e2430dbec037ec5c5b58fcced550382b74"}, + {file = "setproctitle-1.3.3-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:1342f4fdb37f89d3e3c1c0a59d6ddbedbde838fff5c51178a7982993d238fe4f"}, + {file = "setproctitle-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fc74e84fdfa96821580fb5e9c0b0777c1c4779434ce16d3d62a9c4d8c710df39"}, + {file = "setproctitle-1.3.3-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:9617b676b95adb412bb69645d5b077d664b6882bb0d37bfdafbbb1b999568d85"}, + {file = "setproctitle-1.3.3-cp310-cp310-musllinux_1_1_i686.whl", hash = "sha256:6a249415f5bb88b5e9e8c4db47f609e0bf0e20a75e8d744ea787f3092ba1f2d0"}, + {file = "setproctitle-1.3.3-cp310-cp310-musllinux_1_1_ppc64le.whl", hash = "sha256:38da436a0aaace9add67b999eb6abe4b84397edf4a78ec28f264e5b4c9d53cd5"}, + {file = "setproctitle-1.3.3-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:da0d57edd4c95bf221b2ebbaa061e65b1788f1544977288bdf95831b6e44e44d"}, + {file = "setproctitle-1.3.3-cp310-cp310-win32.whl", hash = "sha256:a1fcac43918b836ace25f69b1dca8c9395253ad8152b625064415b1d2f9be4fb"}, + {file = "setproctitle-1.3.3-cp310-cp310-win_amd64.whl", hash = "sha256:200620c3b15388d7f3f97e0ae26599c0c378fdf07ae9ac5a13616e933cbd2086"}, + {file = "setproctitle-1.3.3-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:334f7ed39895d692f753a443102dd5fed180c571eb6a48b2a5b7f5b3564908c8"}, + {file = "setproctitle-1.3.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:950f6476d56ff7817a8fed4ab207727fc5260af83481b2a4b125f32844df513a"}, + {file = "setproctitle-1.3.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:195c961f54a09eb2acabbfc90c413955cf16c6e2f8caa2adbf2237d1019c7dd8"}, + {file = "setproctitle-1.3.3-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f05e66746bf9fe6a3397ec246fe481096664a9c97eb3fea6004735a4daf867fd"}, + {file = "setproctitle-1.3.3-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:b5901a31012a40ec913265b64e48c2a4059278d9f4e6be628441482dd13fb8b5"}, + {file = "setproctitle-1.3.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:64286f8a995f2cd934082b398fc63fca7d5ffe31f0e27e75b3ca6b4efda4e353"}, + {file = "setproctitle-1.3.3-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:184239903bbc6b813b1a8fc86394dc6ca7d20e2ebe6f69f716bec301e4b0199d"}, + {file = "setproctitle-1.3.3-cp311-cp311-musllinux_1_1_i686.whl", hash = "sha256:664698ae0013f986118064b6676d7dcd28fefd0d7d5a5ae9497cbc10cba48fa5"}, + {file = "setproctitle-1.3.3-cp311-cp311-musllinux_1_1_ppc64le.whl", hash = "sha256:e5119a211c2e98ff18b9908ba62a3bd0e3fabb02a29277a7232a6fb4b2560aa0"}, + {file = "setproctitle-1.3.3-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:417de6b2e214e837827067048f61841f5d7fc27926f2e43954567094051aff18"}, + {file = "setproctitle-1.3.3-cp311-cp311-win32.whl", hash = "sha256:6a143b31d758296dc2f440175f6c8e0b5301ced3b0f477b84ca43cdcf7f2f476"}, + {file = "setproctitle-1.3.3-cp311-cp311-win_amd64.whl", hash = "sha256:a680d62c399fa4b44899094027ec9a1bdaf6f31c650e44183b50d4c4d0ccc085"}, + {file = "setproctitle-1.3.3-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:d4460795a8a7a391e3567b902ec5bdf6c60a47d791c3b1d27080fc203d11c9dc"}, + {file = "setproctitle-1.3.3-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:bdfd7254745bb737ca1384dee57e6523651892f0ea2a7344490e9caefcc35e64"}, + {file = "setproctitle-1.3.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:477d3da48e216d7fc04bddab67b0dcde633e19f484a146fd2a34bb0e9dbb4a1e"}, + {file = "setproctitle-1.3.3-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:ab2900d111e93aff5df9fddc64cf51ca4ef2c9f98702ce26524f1acc5a786ae7"}, + {file = "setproctitle-1.3.3-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:088b9efc62d5aa5d6edf6cba1cf0c81f4488b5ce1c0342a8b67ae39d64001120"}, + {file = "setproctitle-1.3.3-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a6d50252377db62d6a0bb82cc898089916457f2db2041e1d03ce7fadd4a07381"}, + {file = "setproctitle-1.3.3-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:87e668f9561fd3a457ba189edfc9e37709261287b52293c115ae3487a24b92f6"}, + {file = "setproctitle-1.3.3-cp312-cp312-musllinux_1_1_i686.whl", hash = "sha256:287490eb90e7a0ddd22e74c89a92cc922389daa95babc833c08cf80c84c4df0a"}, + {file = "setproctitle-1.3.3-cp312-cp312-musllinux_1_1_ppc64le.whl", hash = "sha256:4fe1c49486109f72d502f8be569972e27f385fe632bd8895f4730df3c87d5ac8"}, + {file = "setproctitle-1.3.3-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:4a6ba2494a6449b1f477bd3e67935c2b7b0274f2f6dcd0f7c6aceae10c6c6ba3"}, + {file = "setproctitle-1.3.3-cp312-cp312-win32.whl", hash = "sha256:2df2b67e4b1d7498632e18c56722851ba4db5d6a0c91aaf0fd395111e51cdcf4"}, + {file = "setproctitle-1.3.3-cp312-cp312-win_amd64.whl", hash = "sha256:f38d48abc121263f3b62943f84cbaede05749047e428409c2c199664feb6abc7"}, + {file = "setproctitle-1.3.3-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:816330675e3504ae4d9a2185c46b573105d2310c20b19ea2b4596a9460a4f674"}, + {file = "setproctitle-1.3.3-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:68f960bc22d8d8e4ac886d1e2e21ccbd283adcf3c43136161c1ba0fa509088e0"}, + {file = "setproctitle-1.3.3-cp37-cp37m-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:00e6e7adff74796ef12753ff399491b8827f84f6c77659d71bd0b35870a17d8f"}, + {file = "setproctitle-1.3.3-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:53bc0d2358507596c22b02db079618451f3bd720755d88e3cccd840bafb4c41c"}, + {file = "setproctitle-1.3.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ad6d20f9541f5f6ac63df553b6d7a04f313947f550eab6a61aa758b45f0d5657"}, + {file = "setproctitle-1.3.3-cp37-cp37m-musllinux_1_1_aarch64.whl", hash = "sha256:c1c84beab776b0becaa368254801e57692ed749d935469ac10e2b9b825dbdd8e"}, + {file = "setproctitle-1.3.3-cp37-cp37m-musllinux_1_1_i686.whl", hash = "sha256:507e8dc2891021350eaea40a44ddd887c9f006e6b599af8d64a505c0f718f170"}, + {file = "setproctitle-1.3.3-cp37-cp37m-musllinux_1_1_ppc64le.whl", hash = "sha256:b1067647ac7aba0b44b591936118a22847bda3c507b0a42d74272256a7a798e9"}, + {file = "setproctitle-1.3.3-cp37-cp37m-musllinux_1_1_x86_64.whl", hash = "sha256:2e71f6365744bf53714e8bd2522b3c9c1d83f52ffa6324bd7cbb4da707312cd8"}, + {file = "setproctitle-1.3.3-cp37-cp37m-win32.whl", hash = "sha256:7f1d36a1e15a46e8ede4e953abb104fdbc0845a266ec0e99cc0492a4364f8c44"}, + {file = "setproctitle-1.3.3-cp37-cp37m-win_amd64.whl", hash = "sha256:c9a402881ec269d0cc9c354b149fc29f9ec1a1939a777f1c858cdb09c7a261df"}, + {file = "setproctitle-1.3.3-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:ff814dea1e5c492a4980e3e7d094286077054e7ea116cbeda138819db194b2cd"}, + {file = "setproctitle-1.3.3-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:accb66d7b3ccb00d5cd11d8c6e07055a4568a24c95cf86109894dcc0c134cc89"}, + {file = "setproctitle-1.3.3-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:554eae5a5b28f02705b83a230e9d163d645c9a08914c0ad921df363a07cf39b1"}, + {file = "setproctitle-1.3.3-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:a911b26264dbe9e8066c7531c0591cfab27b464459c74385b276fe487ca91c12"}, + {file = "setproctitle-1.3.3-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2982efe7640c4835f7355fdb4da313ad37fb3b40f5c69069912f8048f77b28c8"}, + {file = "setproctitle-1.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:df3f4274b80709d8bcab2f9a862973d453b308b97a0b423a501bcd93582852e3"}, + {file = "setproctitle-1.3.3-cp38-cp38-musllinux_1_1_aarch64.whl", hash = "sha256:af2c67ae4c795d1674a8d3ac1988676fa306bcfa1e23fddb5e0bd5f5635309ca"}, + {file = "setproctitle-1.3.3-cp38-cp38-musllinux_1_1_i686.whl", hash = "sha256:af4061f67fd7ec01624c5e3c21f6b7af2ef0e6bab7fbb43f209e6506c9ce0092"}, + {file = "setproctitle-1.3.3-cp38-cp38-musllinux_1_1_ppc64le.whl", hash = "sha256:37a62cbe16d4c6294e84670b59cf7adcc73faafe6af07f8cb9adaf1f0e775b19"}, + {file = "setproctitle-1.3.3-cp38-cp38-musllinux_1_1_x86_64.whl", hash = "sha256:a83ca086fbb017f0d87f240a8f9bbcf0809f3b754ee01cec928fff926542c450"}, + {file = "setproctitle-1.3.3-cp38-cp38-win32.whl", hash = "sha256:059f4ce86f8cc92e5860abfc43a1dceb21137b26a02373618d88f6b4b86ba9b2"}, + {file = "setproctitle-1.3.3-cp38-cp38-win_amd64.whl", hash = "sha256:ab92e51cd4a218208efee4c6d37db7368fdf182f6e7ff148fb295ecddf264287"}, + {file = "setproctitle-1.3.3-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:c7951820b77abe03d88b114b998867c0f99da03859e5ab2623d94690848d3e45"}, + {file = "setproctitle-1.3.3-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:5bc94cf128676e8fac6503b37763adb378e2b6be1249d207630f83fc325d9b11"}, + {file = "setproctitle-1.3.3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1f5d9027eeda64d353cf21a3ceb74bb1760bd534526c9214e19f052424b37e42"}, + {file = "setproctitle-1.3.3-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:2e4a8104db15d3462e29d9946f26bed817a5b1d7a47eabca2d9dc2b995991503"}, + {file = "setproctitle-1.3.3-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c32c41ace41f344d317399efff4cffb133e709cec2ef09c99e7a13e9f3b9483c"}, + {file = "setproctitle-1.3.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:cbf16381c7bf7f963b58fb4daaa65684e10966ee14d26f5cc90f07049bfd8c1e"}, + {file = "setproctitle-1.3.3-cp39-cp39-musllinux_1_1_aarch64.whl", hash = "sha256:e18b7bd0898398cc97ce2dfc83bb192a13a087ef6b2d5a8a36460311cb09e775"}, + {file = "setproctitle-1.3.3-cp39-cp39-musllinux_1_1_i686.whl", hash = "sha256:69d565d20efe527bd8a9b92e7f299ae5e73b6c0470f3719bd66f3cd821e0d5bd"}, + {file = "setproctitle-1.3.3-cp39-cp39-musllinux_1_1_ppc64le.whl", hash = "sha256:ddedd300cd690a3b06e7eac90ed4452348b1348635777ce23d460d913b5b63c3"}, + {file = "setproctitle-1.3.3-cp39-cp39-musllinux_1_1_x86_64.whl", hash = "sha256:415bfcfd01d1fbf5cbd75004599ef167a533395955305f42220a585f64036081"}, + {file = "setproctitle-1.3.3-cp39-cp39-win32.whl", hash = "sha256:21112fcd2195d48f25760f0eafa7a76510871bbb3b750219310cf88b04456ae3"}, + {file = "setproctitle-1.3.3-cp39-cp39-win_amd64.whl", hash = "sha256:5a740f05d0968a5a17da3d676ce6afefebeeeb5ce137510901bf6306ba8ee002"}, + {file = "setproctitle-1.3.3-pp310-pypy310_pp73-macosx_10_9_x86_64.whl", hash = "sha256:6b9e62ddb3db4b5205c0321dd69a406d8af9ee1693529d144e86bd43bcb4b6c0"}, + {file = "setproctitle-1.3.3-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:9e3b99b338598de0bd6b2643bf8c343cf5ff70db3627af3ca427a5e1a1a90dd9"}, + {file = "setproctitle-1.3.3-pp310-pypy310_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:38ae9a02766dad331deb06855fb7a6ca15daea333b3967e214de12cfae8f0ef5"}, + {file = "setproctitle-1.3.3-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:200ede6fd11233085ba9b764eb055a2a191fb4ffb950c68675ac53c874c22e20"}, + {file = "setproctitle-1.3.3-pp37-pypy37_pp73-macosx_10_9_x86_64.whl", hash = "sha256:0d3a953c50776751e80fe755a380a64cb14d61e8762bd43041ab3f8cc436092f"}, + {file = "setproctitle-1.3.3-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:e5e08e232b78ba3ac6bc0d23ce9e2bee8fad2be391b7e2da834fc9a45129eb87"}, + {file = "setproctitle-1.3.3-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f1da82c3e11284da4fcbf54957dafbf0655d2389cd3d54e4eaba636faf6d117a"}, + {file = "setproctitle-1.3.3-pp37-pypy37_pp73-win_amd64.whl", hash = "sha256:aeaa71fb9568ebe9b911ddb490c644fbd2006e8c940f21cb9a1e9425bd709574"}, + {file = "setproctitle-1.3.3-pp38-pypy38_pp73-macosx_10_9_x86_64.whl", hash = "sha256:59335d000c6250c35989394661eb6287187854e94ac79ea22315469ee4f4c244"}, + {file = "setproctitle-1.3.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:c3ba57029c9c50ecaf0c92bb127224cc2ea9fda057b5d99d3f348c9ec2855ad3"}, + {file = "setproctitle-1.3.3-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:d876d355c53d975c2ef9c4f2487c8f83dad6aeaaee1b6571453cb0ee992f55f6"}, + {file = "setproctitle-1.3.3-pp38-pypy38_pp73-win_amd64.whl", hash = "sha256:224602f0939e6fb9d5dd881be1229d485f3257b540f8a900d4271a2c2aa4e5f4"}, + {file = "setproctitle-1.3.3-pp39-pypy39_pp73-macosx_10_9_x86_64.whl", hash = "sha256:d7f27e0268af2d7503386e0e6be87fb9b6657afd96f5726b733837121146750d"}, + {file = "setproctitle-1.3.3-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f5e7266498cd31a4572378c61920af9f6b4676a73c299fce8ba93afd694f8ae7"}, + {file = "setproctitle-1.3.3-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:33c5609ad51cd99d388e55651b19148ea99727516132fb44680e1f28dd0d1de9"}, + {file = "setproctitle-1.3.3-pp39-pypy39_pp73-win_amd64.whl", hash = "sha256:eae8988e78192fd1a3245a6f4f382390b61bce6cfcc93f3809726e4c885fa68d"}, + {file = "setproctitle-1.3.3.tar.gz", hash = "sha256:c913e151e7ea01567837ff037a23ca8740192880198b7fbb90b16d181607caae"}, +] + +[package.extras] +test = ["pytest"] + [[package]] name = "setuptools" version = "69.1.1" @@ -3362,6 +3580,17 @@ ssh = ["paramiko"] test = ["azure-common", "azure-core", "azure-storage-blob", "boto3", "google-cloud-storage (>=2.6.0)", "moto[server]", "paramiko", "pytest", "pytest-rerunfailures", "requests", "responses"] webhdfs = ["requests"] +[[package]] +name = "smmap" +version = "5.0.1" +description = "A pure Python implementation of a sliding window memory map manager" +optional = true +python-versions = ">=3.7" +files = [ + {file = "smmap-5.0.1-py3-none-any.whl", hash = "sha256:e6d8668fa5f93e706934a62d7b4db19c8d9eb8cf2adbb75ef1b675aa332b69da"}, + {file = "smmap-5.0.1.tar.gz", hash = "sha256:dceeb6c0028fdb6734471eb07c0cd2aae706ccaecab45965ee83f11c8d3b1f62"}, +] + [[package]] name = "sniffio" version = "1.3.1" @@ -4246,6 +4475,44 @@ typing-extensions = {version = ">=4.0", markers = "python_version < \"3.11\""} [package.extras] standard = ["colorama (>=0.4)", "httptools (>=0.5.0)", "python-dotenv (>=0.13)", "pyyaml (>=5.1)", "uvloop (>=0.14.0,!=0.15.0,!=0.15.1)", "watchfiles (>=0.13)", "websockets (>=10.4)"] +[[package]] +name = "wandb" +version = "0.16.4" +description = "A CLI and library for interacting with the Weights & Biases API." +optional = true +python-versions = ">=3.7" +files = [ + {file = "wandb-0.16.4-py3-none-any.whl", hash = "sha256:bb9eb5aa2c2c85e11c76040c4271366f54d4975167aa6320ba86c3f2d97fe5fa"}, + {file = "wandb-0.16.4.tar.gz", hash = "sha256:8752c67d1347a4c29777e64dc1e1a742a66c5ecde03aebadf2b0d62183fa307c"}, +] + +[package.dependencies] +appdirs = ">=1.4.3" +Click = ">=7.1,<8.0.0 || >8.0.0" +docker-pycreds = ">=0.4.0" +GitPython = ">=1.0.0,<3.1.29 || >3.1.29" +protobuf = {version = ">=3.19.0,<4.21.0 || >4.21.0,<5", markers = "python_version > \"3.9\" or sys_platform != \"linux\""} +psutil = ">=5.0.0" +PyYAML = "*" +requests = ">=2.0.0,<3" +sentry-sdk = ">=1.0.0" +setproctitle = "*" +setuptools = "*" + +[package.extras] +async = ["httpx (>=0.23.0)"] +aws = ["boto3"] +azure = ["azure-identity", "azure-storage-blob"] +gcp = ["google-cloud-storage"] +importers = ["filelock", "mlflow", "polars", "rich", "tenacity"] +kubeflow = ["google-cloud-storage", "kubernetes", "minio", "sh"] +launch = ["PyYAML (>=6.0.0)", "awscli", "azure-containerregistry", "azure-identity", "azure-storage-blob", "boto3", "botocore", "chardet", "google-auth", "google-cloud-aiplatform", "google-cloud-artifact-registry", "google-cloud-compute", "google-cloud-storage", "iso8601", "kubernetes", "kubernetes-asyncio", "nbconvert", "nbformat", "optuna", "pydantic", "tomli", "typing-extensions"] +media = ["bokeh", "moviepy", "numpy", "pillow", "plotly (>=5.18.0)", "rdkit-pypi", "soundfile"] +models = ["cloudpickle"] +perf = ["orjson"] +reports = ["pydantic (>=2.0.0)"] +sweeps = ["sweeps (>=0.2.0)"] + [[package]] name = "wasabi" version = "1.1.2" @@ -4397,7 +4664,10 @@ files = [ numpy = "*" torch = "2.1.0" +[extras] +observable = ["wandb"] + [metadata] lock-version = "2.0" python-versions = "^3.10" -content-hash = "3e73aa02f5f2b1f150e2b5b5f302fc3c77b77dad2f17d342a20d5020291a59e3" +content-hash = "8ba6b5f414064d328435a19d59b1e51656947aa772dcc991d1a838fe2f74166f" diff --git a/pyproject.toml b/pyproject.toml index 621bcd6..dface05 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -23,10 +23,17 @@ pydub = "^0.25.1" gradio = "^4.20.1" huggingface_hub = "^0.21.4" click = "^8.1.7" +wandb = { version = "^0.16.4", optional = true } [tool.poetry.dev-dependencies] pytest = "^8.0.2" +[tool.poetry.extras] +observable = ["wandb"] + +[tool.poetry.scripts] +finetune = "fam.llm.finetune:main" + [build-system] requires = ["poetry-core"] build-backend = "poetry.core.masonry.api" diff --git a/tests/llm/loaders/test_dataloader.py b/tests/llm/loaders/test_dataloader.py new file mode 100644 index 0000000..7730d99 --- /dev/null +++ b/tests/llm/loaders/test_dataloader.py @@ -0,0 +1,41 @@ +import itertools +from pathlib import Path + +import pytest +import torch +from huggingface_hub import snapshot_download +from torch.utils.data import DataLoader + +from fam.llm.config.finetune_params import audio_token_mode as atm +from fam.llm.config.finetune_params import num_max_audio_tokens_timesteps +from fam.llm.loaders.training_data import DynamicComputeDataset +from fam.llm.preprocessing.audio_token_mode import get_params_for_mode + + +@pytest.mark.parametrize("dataset", ["tests/resources/datasets/sample_dataset.csv"]) +@pytest.mark.skip(reason="Requires ckpt download, not feasible as test suite") +def test_dataset_preprocess_e2e(dataset): + model_name = "metavoiceio/metavoice-1B-v0.1" + device = "cuda" + mode_params = get_params_for_mode(atm, num_max_audio_tokens_timesteps=num_max_audio_tokens_timesteps) + + _model_dir = snapshot_download(repo_id=model_name) + checkpoint_path = Path(f"{_model_dir}/first_stage.pt") + spk_emb_ckpt_path = Path(f"{_model_dir}/speaker_encoder.pt") + checkpoint = torch.load(str(checkpoint_path), mmap=True, weights_only=False) + tokenizer_info = checkpoint.get("meta", {}).get("tokenizer", {}) + + dataset = DynamicComputeDataset.from_meta( + tokenizer_info, + mode_params["combine_func"], + spk_emb_ckpt_path, + dataset, + mode_params["pad_token"], + mode_params["ctx_window"], + device + ) + dataloader = DataLoader(dataset, batch_size=1, shuffle=False, num_workers=0) + result = next(iter(dataloader)) + + # TODO: better assertions based on sample input dims + assert len(result) == 2 diff --git a/tests/resources/data/caption.txt b/tests/resources/data/caption.txt new file mode 100644 index 0000000..a2b99bc --- /dev/null +++ b/tests/resources/data/caption.txt @@ -0,0 +1 @@ +Please call Stella. \ No newline at end of file diff --git a/tests/resources/datasets/sample_dataset.csv b/tests/resources/datasets/sample_dataset.csv new file mode 100644 index 0000000..52c0d31 --- /dev/null +++ b/tests/resources/datasets/sample_dataset.csv @@ -0,0 +1,401 @@ +audio_files,captions +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt +./data/audio.wav,./data/caption.txt