Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto training #48

Merged
merged 28 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
6aef18d
Updated/added functions to support auto training of new wakeword mode…
dscripka Aug 16, 2023
a831e38
covered more permutations in adversarial phrase generation
dscripka Aug 17, 2023
82b2418
Fixed mypy and flake8 issues, added train.py [ckip ci]
dscripka Sep 3, 2023
ef46734
Updated requirements for full installation [skip ci]
dscripka Sep 4, 2023
1815ca5
Added example YML file for training new models
dscripka Sep 4, 2023
d7e2626
Partial draft of automatic model training example notebook, updates t…
dscripka Sep 4, 2023
f3e74cd
Working example of automatic model training complete [skip ci]
dscripka Sep 4, 2023
6a64b19
Updated requirements for full install [skip ci]
dscripka Sep 4, 2023
e0f0c0e
Updated requirements to fix gaps, edits to example notebook to work w…
dscripka Sep 4, 2023
7cafd26
Edits to example notebook for auto training [skip ci]
dscripka Sep 4, 2023
8ad5248
More edits to example auto training notebook [ckip ci]
dscripka Sep 5, 2023
0620cd0
Fixed merge conflicts [skip ci]
dscripka Sep 5, 2023
83d8bae
Convert clips to 16-bit PCM before saving [skip ci]
dscripka Sep 5, 2023
8ecb493
Moved tflite conversion to its own function [skip ci]
dscripka Sep 6, 2023
fdab81f
Moved tflite conversion outside of training class for more portabilit…
dscripka Sep 11, 2023
6700161
fixed bad arg [skip ci]
dscripka Sep 11, 2023
7d27b9b
Edits to README and example training notebooks [skip ci]
dscripka Sep 11, 2023
dfdeaa2
Small adjustments to training config file and train.py
dscripka Oct 2, 2023
fd36a56
Adjusted warning message
dscripka Oct 2, 2023
a070061
Passing flake8 and mypy tests locally [skip ci]
dscripka Oct 2, 2023
4833873
Updated autotraining function to save checkpoints based on percentile…
dscripka Oct 3, 2023
62818e5
Fixed argparse issue [skip ci]
dscripka Oct 6, 2023
3594e59
added missing dependency [skip ci]
dscripka Oct 7, 2023
ef50fcf
Fixed bugs in auto-training process, removed deprecated arguments [sk…
dscripka Oct 8, 2023
1ae41de
Adjusted training config example [skip ci]
dscripka Oct 8, 2023
ace1473
Added links to updated colab notebooks in Readme [skip ci]
dscripka Oct 11, 2023
b318bfe
Added acknowledgements to readme [skip ci]
dscripka Oct 12, 2023
b2a3ee6
fixed merged conflicts
dscripka Oct 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,13 @@ While the models are trained with background noise to increase robustness, in so

# Training New Models

Training new models is conceptually simple, and the entire process is demonstrated in a [tutorial notebook](notebooks/training_models.ipynb).
openWakeWord includes an automated utility that greatly simplifies the process of training custom models. This can be used in two ways:

1) In a simple [Google Colab](https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb?usp=sharing) notebook with an easy to use interface and simple end-to-end process. This allows anyone to produce a custom model very quickly (<1 hour) and doesn't require any development experience, but the performance of the model may be low in some deployment scenarios.

2) A more detailed [notebook](notebooks/automatic_model_training.ipynb) (also on [Google Colab](https://colab.research.google.com/drive/1yyFH-fpguX2BTAW8wSQxTrJnJTM-0QAd?usp=sharing)) that describes the training process in more details, and enables more customization. This can produce high quality models, but requires more development experience.

For users interested in understanding the fundamental concepts behind model training there is a more detailed, educational [tutorial notebook](notebooks/training_models.ipynb) also available. However, this specific notebook is not intended for training production models, and the automated process above is recommended for that purpose.

Fundamentally, a new model requires two data generation and collection steps:

Expand All @@ -233,14 +239,24 @@ Future release road maps may have non-english support. In particular, [Mycroft.A
- While the ONNX runtime [does support javascript](https://onnxruntime.ai/docs/get-started/with-javascript.html), much of the other functionality required for openWakeWord models would need to be ported. This is not currently on the roadmap, but please open an issue/start a discussion if this feature is of particular interest.

**Is there a C++ version of openWakeWord?**
- While the ONNX runtime [also has a C++ API](https://onnxruntime.ai/docs/get-started/with-cpp.html), there isn't an official C++ implementation of the full openWakeWord library. However, [@synesthesiam](https://github.com/synesthesiam) has created a [C++ version](https://github.com/rhasspy/openWakeWord-cpp) of openWakeWord with the essential functionality implemented.
- While the ONNX runtime [also has a C++ API](https://onnxruntime.ai/docs/get-started/with-cpp.html), there isn't an official C++ implementation of the full openWakeWord library. However, [@synesthesiam](https://github.com/synesthesiam) has created a [C++ version](https://github.com/rhasspy/openWakeWord-cpp) of openWakeWord with basic functionality implemented.

**Why are there three separate models instead of just one?**
- Separating the models was an intentional choice to provide flexibility and optimize the efficiency of the end-to-end prediction process. For example, with separate melspectrogram, embedding, and prediction models, each one can operate on different size inputs of audio to optimize overall latency and share computations between models. It certainly is possible to make a combined model with all of the steps integrated, though, if that was a requirement of a particular use case.

**I still get a large number of false activations when I use the pre-trained models, how can I reduce these?**
- First, review the [recommendations for usage](#recommendations-for-usage) and ensure that these options do not improve overall system accuracy. Second, experiment with [custom verifier models](#user-specific-models), if possible. If neither of these approaches are helping, please open an issue with details of the deployment environment and the types of false activations that you are experiencing. We certainly appreciate feedback & requests on how to improve the base pre-trained models!

# Acknowledgements

I am very grateful for the encouraging and positive response from the open-source community since the release of openWakeWord in January 2023. In particular, I want to acknowledge and thank the following individuals and groups for their feedback, collaboration, and development support:

- [synesthesiam](https://github.com/synesthesiam)
- [SecretSauceAI](https://github.com/secretsauceai)
- [OpenVoiceOS](https://github.com/OpenVoiceOS)
- [Nabu Casa](https://github.com/NabuCasa)
- [Home Assistant](https://github.com/home-assistant)

# License

All of the code in this repository is licensed under the **Apache 2.0** license. All of the included pre-trained models are licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/) license due to the inclusion of datasets with unknown or restrictive licensing as part of the training data. If you are interested in pre-trained models with more permissive licensing, please raise an issue and we will try to add them to a future release.
101 changes: 101 additions & 0 deletions examples/custom_model.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
## Configuration file to be used with `train.py` to create custom wake word/phrase models

# The name of the model (will be used when creating directoires and when saving the final .onnx and .tflite files)
model_name: "my_model"

# The target word/phrase to be detected by the model. Adding multiple unique words/phrases will
# still only train a binary model detection model, but it will activate on any one of the provided words/phrases.
target_phrase:
- "hey jarvis"

# Specific phrases that you do *not* want the model to activate on, outside of those generated automatically via phoneme overlap
# This can be a good way to reduce false positives if you notice that, in practice, certain words or phrases are problematic
custom_negative_phrases: []

# The total number of positive samples to generate for training (minimum of 20,000 recommended, often 100,000+ is best)
n_samples: 10000

# The total number of positive samples to generate for validation and early stopping of model training
n_samples_val: 2000

# The batch size to use with Piper TTS when generating synthetic training data
tts_batch_size: 50

# The batch size to use when performing data augmentation on generated clips prior to training
# It's recommended that this not be too large to ensure that there is enough variety in the augmentation
augmentation_batch_size: 16

# The path to a fork of the piper-sample-generator repository for TTS (https://github.com/dscripka/piper-sample-generator)
piper_sample_generator_path: "./piper-sample-generator"

# The output directory for the generated synthetic clips, openwakeword features, and trained models
# Sub-directories will be automatically created for train and test clips for both positive and negative examples
output_dir: "./my_custom_model"

# The directories containing Room Impulse Response recordings
rir_paths:
- "./mit_rirs"

# The directories containing background audio files to mix with training data
background_paths:
- "./background_clips"

# The duplication rate for the background audio clips listed above (1 or higher). Can be useful as a way to oversample
# a particular type of background noise more relevant to a given deployment environment. Values apply in the same
# order as the background_paths list above. Only useful when multiple directories are provided above.
background_paths_duplication_rate:
- 1

# The location of pre-computed openwakeword features for false-positive validation data
# If you do not have deployment environment validation data, a good general purpose dataset with
# a reasonable mix with ~11 hours of speech, noise, and music is available here: https://huggingface.co/datasets/davidscripka/openwakeword_features
false_positive_validation_data_path: "./validation_set_features.npy"

# The number of times to apply augmentations to the generated training data
# Values greater than 1 reuse each generation that many times, producing overall unique
# clips for training due to the randomness intrinsic to the augmentation despite using
# the same original synthetic generation. Can be a useful way to increase model robustness
# without having to generate extremely large numbers of synthetic examples.
augmentation_rounds: 1

# Paths to pre-computed openwakeword features for positive and negative data. Each file must be a saved
# .npy array (see the example notebook on manually training new models for details on how to create these).
# There is no limit on the number of files but training speed will decrease as more
# data will need to be read from disk for each additional file.
# Also, there is a custom dataloader that uses memory-mapping with loading data, so the total size
# of the files is not limited by the amount of available system memory (though this will result
# in decreased training throughput depending on the speed of the underlying storage device). A fast
# NVME SSD is recommended for optimal performance.

feature_data_files:
"ACAV100M_sample": "./openwakeword_features_ACAV100M_2000_hrs_16bit.npy"

# Define the number of examples from each data file per batch. Note that the key names here
# must correspond to those define in the `feature_data_files` dictionary above (except for
# the `positive` and `adversarial_negative` keys, which are automatically defined). The sum
# of the values for each key define the total batch size for training. Initial testing indicates
# that batch sizes of 1024-4096 work well in practice.

batch_n_per_class:
"ACAV100M_sample": 1024
"adversarial_negative": 50
"positive": 50

# Define the type of size of the openwakeword model to train. Increasing the layer size
# may result in a more capable model, at the cost of decreased inference speed. The default
# value (32) seems to work well in practice for most wake words/phrases.

model_type: "dnn"
layer_size: 32

# Define training parameters. The values below are recommended defaults for most applications,
# but unique deployment environments will likely require testing to determine which values
# are the most appropriate.

# The maximum number of steps to train the model
steps: 50000

# The maximum negative weight and target false positives per hour, used to control the auto training process
# The target false positive rate may not be achieved, and adjusting the maximum negative weight may be necessary
max_negative_weight: 1500
target_false_positives_per_hour: 0.2
4 changes: 2 additions & 2 deletions examples/detect_from_microphone.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@
parser=argparse.ArgumentParser()
parser.add_argument(
"--chunk_size",
help="How much audio (in samples) to predict on at once",
help="How much audio (in number of samples) to predict on at once",
type=int,
default=1280,
required=True
required=False
)
parser.add_argument(
"--model_path",
Expand Down
Loading