Skip to content

Commit

Permalink
avoid num_workers errors
Browse files Browse the repository at this point in the history
- point to model zoo 0.11.0.dev1
- give user instructions for installing bmz as package if num_workers>0 gives error
- update notebooks
  • Loading branch information
sammlapp committed Oct 6, 2024
1 parent c5fa81c commit 575792e
Show file tree
Hide file tree
Showing 4 changed files with 68 additions and 61 deletions.
11 changes: 11 additions & 0 deletions docs/tutorials/predict_with_cnn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -677,6 +677,17 @@
"scores.head()"
]
},
{
"cell_type": "markdown",
"id": "6219c20f",
"metadata": {},
"source": [
"> Note on Error \"module not found: bioacoustics_model_zoo\" when using multiprocessing (num_workers>0):\n",
"if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:\n",
"> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
"> as the torch.hub api seems to have trouble with multiprocessing for some model classes. "
]
},
{
"cell_type": "markdown",
"id": "93f2d2aa",
Expand Down
46 changes: 20 additions & 26 deletions docs/tutorials/training_birdnet_and_perch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@
"\n",
"Note that in this tutorial, all classifiers are trained as multi-target (each class is predicted independently, such that any sample can have 0, 1, or >1 classes present). Most bioacoustics classification tasks are multi-target. \n",
"\n",
"\n",
"> Note on Error \"module not found: bioacoustics_model_zoo\" when using multiprocessing (num_workers>0):\n",
"if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:\n",
"> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
"> as the torch.hub api seems to have trouble with multiprocessing for some model classes. \n",
"\n",
"\n",
"[1] Ghani, B., T. Denton, S. Kahl, H. Klinck, T. Denton, S. Kahl, and H. Klinck. 2023. Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Scientific Reports 13:22876.\n",
"\n",
"[2] Kahl, Stefan, et al. \"BirdNET: A deep learning solution for avian diversity monitoring.\" Ecological Informatics 61 (2021): 101236.\n"
Expand All @@ -35,7 +42,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -63,7 +70,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -100,7 +107,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -178,7 +185,7 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -211,7 +218,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": 64,
"metadata": {},
"outputs": [
{
Expand All @@ -226,7 +233,7 @@
"dtype: int64"
]
},
"execution_count": 41,
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -245,7 +252,7 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -272,22 +279,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Using cache found in /Users/SML161/.cache/torch/hub/kitzeslab_bioacoustics-model-zoo_birdnet_train\n"
"Downloading: \"https://github.com/kitzeslab/bioacoustics-model-zoo/zipball/birdnet_train\" to /Users/SML161/.cache/torch/hub/birdnet_train.zip\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Labels_af.txt\n",
"downloading model from URL...\n"
"File BirdNET_GLOBAL_6K_V2.4_Labels_af.txt already exists; skipping download.\n",
"downloading model from URL...\n",
"File BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite already exists; skipping download.\n"
]
},
{
Expand All @@ -309,26 +317,12 @@
"/Users/SML161/miniconda3/envs/tensorflow/lib/python3.9/site-packages/opensoundscape/ml/cnn.py:630: UserWarning: Failed to detect expected # input channels of this architecture.Make sure your architecture expects the number of channels equal to `channels` argument 1). Pytorch architectures generally expect 3 channels by default.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO: Created TensorFlow Lite XNNPACK delegate for CPU.\n"
]
}
],
"source": [
"tag = \"birdnet_train\" # the branch of the model zoo with compatible models\n",
"birdnet = torch.hub.load(\n",
" f\"kitzeslab/bioacoustics-model-zoo:{tag}\", 'BirdNET', trust_repo=True, \n",
" f\"kitzeslab/bioacoustics-model-zoo:{tag}\", 'BirdNET', trust_repo=True, force_reload=True\n",
" )"
]
},
Expand Down
63 changes: 30 additions & 33 deletions docs/tutorials/transfer_learning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@
"\n",
"Users can develop flexible and customizable transfer-learning workflow by generating embeddings then using PyTorch or sklearn directly. This notebook demonstrates both (1) high-level functions and classes in OpenSoundscape that simplify the code needed to perform transfer learning; and (2) examples demonstrating the embedding and model fitting steps explicitly line-by-line.\n",
"\n",
"> Note on Error \"module not found: bioacoustics_model_zoo\" when using multiprocessing (num_workers>0):\n",
"if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:\n",
"> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
"> as the torch.hub api seems to have trouble with multiprocessing for some model classes. \n",
"\n",
"[1] Ghani, B., T. Denton, S. Kahl, H. Klinck, T. Denton, S. Kahl, and H. Klinck. 2023. Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Scientific Reports 13:22876.\n",
"\n",
"[2] Kahl, Stefan, et al. \"BirdNET: A deep learning solution for avian diversity monitoring.\" Ecological Informatics 61 (2021): 101236.\n"
Expand All @@ -35,7 +40,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -44,7 +49,9 @@
" %pip install git+https://github.com/kitzeslab/opensoundscape@develop ipykernel==5.5.6 ipython==7.34.0 pillow==9.4.0\n",
" num_workers=0\n",
"else:\n",
" num_workers=4"
" #can use >0, e.g. 4, but might need to install the bioacoustics model zoo as a package:\n",
" # `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
" num_workers=0 "
]
},
{
Expand All @@ -63,7 +70,7 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -101,7 +108,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -179,7 +186,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 46,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -221,7 +228,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 47,
"metadata": {},
"outputs": [
{
Expand All @@ -236,7 +243,7 @@
"dtype: int64"
]
},
"execution_count": 8,
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -255,7 +262,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -273,7 +280,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 49,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -308,6 +315,11 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/SML161/opensoundscape/opensoundscape/preprocess/preprocessors.py:504: DeprecationWarning: sample_shape argument is deprecated. Please use height, width, channels arguments instead. \n",
" The current behavior is to override height, width, channels with sample_shape \n",
" when sample_shape is not None.\n",
" \n",
" warnings.warn(\n",
"/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:606: UserWarning: \n",
" This architecture is not listed in opensoundscape.ml.cnn_architectures.ARCH_DICT.\n",
" It will not be available for loading after saving the model with .save() (unless using pickle=True). \n",
Expand Down Expand Up @@ -341,7 +353,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -471,7 +483,7 @@
" validation_df=labels_val,\n",
" steps=1000,\n",
" embedding_batch_size=128,\n",
" embedding_num_workers=0,\n",
" embedding_num_workers=num_workers,\n",
")"
]
},
Expand Down Expand Up @@ -557,7 +569,7 @@
"source": [
"Alternatively, we can embed the training and validation sets first, then train as many different variants as we want.\n",
"\n",
"(note that the `fit_classifier_on_embeddings` returns the embeddings on the training and validation set, so if you've already run that functino you don't need to re-generate the embeddings)\n",
"(note that the `fit_classifier_on_embeddings` returns the embeddings on the training and validation set, so if you've already run that function you don't need to re-generate the embeddings)\n",
"\n",
"Generally, embedding may take a while for large datasets, but training the shallow classifier will be very fast because the network is small and there is no preprocessing or data loading. \n",
"\n",
Expand All @@ -566,13 +578,13 @@
},
{
"cell_type": "code",
"execution_count": 76,
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"# uncomment to generate training and validation set embeddings, if you don't have them from the previous cells\n",
"# emb_train = hawk.embed(labels_train, return_dfs=False, batch_size=128, num_workers=0)\n",
"# emb_val = hawk.embed(labels_val, return_dfs=False, batch_size=128, num_workers=0)"
"# uncomment the lines below to generate training and validation set embeddings, if you don't have them from the previous cells\n",
"# emb_train = hawk.embed(labels_train, return_dfs=False, batch_size=128, num_workers=num_workers)\n",
"# emb_val = hawk.embed(labels_val, return_dfs=False, batch_size=128, num_workers=num_workers)"
]
},
{
Expand Down Expand Up @@ -723,24 +735,9 @@
},
{
"cell_type": "code",
"execution_count": 82,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "703cfbd664ba4e6f864ab9b0719fef6d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/4 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"outputs": [],
"source": [
"from opensoundscape.ml.shallow_classifier import augmented_embed\n",
"train_emb_aug, train_label_aug = augmented_embed(hawk,labels_train.sample(512), batch_size=128, num_workers=num_workers,n_augmentation_variants=4)"
Expand Down
9 changes: 7 additions & 2 deletions opensoundscape/ml/bioacoustics_model_zoo.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ def list_models(**kwargs):
see also: load(model)
"""
tag = "0.11.0" # in the future, might be based on opensoundscape.__version__
tag = "0.11.0.dev1" # in the future, might be based on opensoundscape.__version__
return torch.hub.list(
f"kitzeslab/bioacoustics-model-zoo:{tag}", trust_repo=True, **kwargs
)
Expand Down Expand Up @@ -45,10 +45,15 @@ def load(model, tag=None, **kwargs):
(see https://github.com/kitzeslab/bioacoustics-model-zoo landing page for
detailed instructions)
> Note on Error "module not found: bioacoustics_model_zoo" when using multiprocessing (num_workers>0):
if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:
> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`
> as the torch.hub api seems to have trouble with multiprocessing for some model classes.
"""
if tag is None:
# in the future, might be based on opensoundscape.__version__
tag = "0.11.0"
tag = "0.11.0.dev1"
return torch.hub.load(
f"kitzeslab/bioacoustics-model-zoo:{tag}", model, trust_repo=True, **kwargs
)

0 comments on commit 575792e

Please sign in to comment.