avoid num_workers errors

- point to model zoo 0.11.0.dev1 - give user instructions for installing bmz as package if num_workers>0 gives error - update notebooks
kitzeslab · Oct 6, 2024 · 575792e · 575792e
1 parent c5fa81c
commit 575792e
Show file tree

Hide file tree

Showing 4 changed files with 68 additions and 61 deletions.
diff --git a/docs/tutorials/predict_with_cnn.ipynb b/docs/tutorials/predict_with_cnn.ipynb
@@ -677,6 +677,17 @@
     "scores.head()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "6219c20f",
+   "metadata": {},
+   "source": [
+    "> Note on Error \"module not found: bioacoustics_model_zoo\" when using multiprocessing (num_workers>0):\n",
+    "if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:\n",
+    "> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
+    "> as the torch.hub api seems to have trouble with multiprocessing for some model classes. "
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "93f2d2aa",

diff --git a/docs/tutorials/training_birdnet_and_perch.ipynb b/docs/tutorials/training_birdnet_and_perch.ipynb
@@ -14,6 +14,13 @@
     "\n",
     "Note that in this tutorial, all classifiers are trained as multi-target (each class is predicted independently, such that any sample can have 0, 1, or >1 classes present). Most bioacoustics classification tasks are multi-target. \n",
     "\n",
+    "\n",
+    "> Note on Error \"module not found: bioacoustics_model_zoo\" when using multiprocessing (num_workers>0):\n",
+    "if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:\n",
+    "> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
+    "> as the torch.hub api seems to have trouble with multiprocessing for some model classes. \n",
+    "\n",
+    "\n",
     "[1] Ghani, B., T. Denton, S. Kahl, H. Klinck, T. Denton, S. Kahl, and H. Klinck. 2023. Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Scientific Reports 13:22876.\n",
     "\n",
     "[2] Kahl, Stefan, et al. \"BirdNET: A deep learning solution for avian diversity monitoring.\" Ecological Informatics 61 (2021): 101236.\n"
@@ -35,7 +42,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 60,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -63,7 +70,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 61,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -100,7 +107,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 62,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -178,7 +185,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 40,
+   "execution_count": 63,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -211,7 +218,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 64,
    "metadata": {},
    "outputs": [
     {
@@ -226,7 +233,7 @@
        "dtype: int64"
       ]
      },
-     "execution_count": 41,
+     "execution_count": 64,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -245,7 +252,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 65,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -272,22 +279,23 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 66,
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Using cache found in /Users/SML161/.cache/torch/hub/kitzeslab_bioacoustics-model-zoo_birdnet_train\n"
+      "Downloading: \"https://github.com/kitzeslab/bioacoustics-model-zoo/zipball/birdnet_train\" to /Users/SML161/.cache/torch/hub/birdnet_train.zip\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Labels_af.txt\n",
-      "downloading model from URL...\n"
+      "File BirdNET_GLOBAL_6K_V2.4_Labels_af.txt already exists; skipping download.\n",
+      "downloading model from URL...\n",
+      "File BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite already exists; skipping download.\n"
      ]
     },
     {
@@ -309,26 +317,12 @@
       "/Users/SML161/miniconda3/envs/tensorflow/lib/python3.9/site-packages/opensoundscape/ml/cnn.py:630: UserWarning: Failed to detect expected # input channels of this architecture.Make sure your architecture expects the number of channels equal to `channels` argument 1). Pytorch architectures generally expect 3 channels by default.\n",
       "  warnings.warn(\n"
      ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Downloaded completed: BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "INFO: Created TensorFlow Lite XNNPACK delegate for CPU.\n"
-     ]
     }
    ],
    "source": [
     "tag = \"birdnet_train\" # the branch of the model zoo with compatible models\n",
     "birdnet = torch.hub.load(\n",
-    "        f\"kitzeslab/bioacoustics-model-zoo:{tag}\", 'BirdNET', trust_repo=True, \n",
+    "        f\"kitzeslab/bioacoustics-model-zoo:{tag}\", 'BirdNET', trust_repo=True, force_reload=True\n",
     "    )"
    ]
   },

diff --git a/docs/tutorials/transfer_learning.ipynb b/docs/tutorials/transfer_learning.ipynb
@@ -14,6 +14,11 @@
     "\n",
     "Users can develop flexible and customizable transfer-learning workflow by generating embeddings then using PyTorch or sklearn directly. This notebook demonstrates both (1) high-level functions and classes in OpenSoundscape that simplify the code needed to perform transfer learning; and (2) examples demonstrating the embedding and model fitting steps explicitly line-by-line.\n",
     "\n",
+    "> Note on Error \"module not found: bioacoustics_model_zoo\" when using multiprocessing (num_workers>0):\n",
+    "if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:\n",
+    "> `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
+    "> as the torch.hub api seems to have trouble with multiprocessing for some model classes. \n",
+    "\n",
     "[1] Ghani, B., T. Denton, S. Kahl, H. Klinck, T. Denton, S. Kahl, and H. Klinck. 2023. Global birdsong embeddings enable superior transfer learning for bioacoustic classification. Scientific Reports 13:22876.\n",
     "\n",
     "[2] Kahl, Stefan, et al. \"BirdNET: A deep learning solution for avian diversity monitoring.\" Ecological Informatics 61 (2021): 101236.\n"
@@ -35,7 +40,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 52,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -44,7 +49,9 @@
     "  %pip install git+https://github.com/kitzeslab/opensoundscape@develop ipykernel==5.5.6 ipython==7.34.0 pillow==9.4.0\n",
     "  num_workers=0\n",
     "else:\n",
-    "  num_workers=4"
+    "  #can use >0, e.g. 4, but might need to install the bioacoustics model zoo as a package:\n",
+    "  # `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`\n",
+    "  num_workers=0 "
    ]
   },
   {
@@ -63,7 +70,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 44,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -101,7 +108,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 45,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -179,7 +186,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 46,
    "metadata": {},
    "outputs": [
     {
@@ -221,7 +228,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 47,
    "metadata": {},
    "outputs": [
     {
@@ -236,7 +243,7 @@
        "dtype: int64"
       ]
      },
-     "execution_count": 8,
+     "execution_count": 47,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -255,7 +262,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 48,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -273,7 +280,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 49,
    "metadata": {},
    "outputs": [
     {
@@ -308,6 +315,11 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "/Users/SML161/opensoundscape/opensoundscape/preprocess/preprocessors.py:504: DeprecationWarning: sample_shape argument is deprecated. Please use height, width, channels arguments instead. \n",
+      "                The current behavior is to override height, width, channels with sample_shape \n",
+      "                when sample_shape is not None.\n",
+      "                \n",
+      "  warnings.warn(\n",
       "/Users/SML161/opensoundscape/opensoundscape/ml/cnn.py:606: UserWarning: \n",
       "                    This architecture is not listed in opensoundscape.ml.cnn_architectures.ARCH_DICT.\n",
       "                    It will not be available for loading after saving the model with .save() (unless using pickle=True). \n",
@@ -341,7 +353,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 50,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -471,7 +483,7 @@
     "    validation_df=labels_val,\n",
     "    steps=1000,\n",
     "    embedding_batch_size=128,\n",
-    "    embedding_num_workers=0,\n",
+    "    embedding_num_workers=num_workers,\n",
     ")"
    ]
   },
@@ -557,7 +569,7 @@
    "source": [
     "Alternatively, we can embed the training and validation sets first, then train as many different variants as we want.\n",
     "\n",
-    "(note that the `fit_classifier_on_embeddings` returns the embeddings on the training and validation set, so if you've already run that functino you don't need to re-generate the embeddings)\n",
+    "(note that the `fit_classifier_on_embeddings` returns the embeddings on the training and validation set, so if you've already run that function you don't need to re-generate the embeddings)\n",
     "\n",
     "Generally, embedding may take a while for large datasets, but training the shallow classifier will be very fast because the network is small and there is no preprocessing or data loading. \n",
     "\n",
@@ -566,13 +578,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 76,
+   "execution_count": 51,
    "metadata": {},
    "outputs": [],
    "source": [
-    "# uncomment to generate training and validation set embeddings, if you don't have them from the previous cells\n",
-    "# emb_train = hawk.embed(labels_train, return_dfs=False, batch_size=128, num_workers=0)\n",
-    "# emb_val = hawk.embed(labels_val, return_dfs=False, batch_size=128, num_workers=0)"
+    "# uncomment the lines below to generate training and validation set embeddings, if you don't have them from the previous cells\n",
+    "# emb_train = hawk.embed(labels_train, return_dfs=False, batch_size=128, num_workers=num_workers)\n",
+    "# emb_val = hawk.embed(labels_val, return_dfs=False, batch_size=128, num_workers=num_workers)"
    ]
   },
   {
@@ -723,24 +735,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 82,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "703cfbd664ba4e6f864ab9b0719fef6d",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "  0%|          | 0/4 [00:00<?, ?it/s]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
+   "outputs": [],
    "source": [
     "from opensoundscape.ml.shallow_classifier import augmented_embed\n",
     "train_emb_aug, train_label_aug = augmented_embed(hawk,labels_train.sample(512), batch_size=128, num_workers=num_workers,n_augmentation_variants=4)"

diff --git a/opensoundscape/ml/bioacoustics_model_zoo.py b/opensoundscape/ml/bioacoustics_model_zoo.py
@@ -14,7 +14,7 @@ def list_models(**kwargs):
 
     see also: load(model)
     """
-    tag = "0.11.0"  # in the future, might be based on opensoundscape.__version__
+    tag = "0.11.0.dev1"  # in the future, might be based on opensoundscape.__version__
     return torch.hub.list(
         f"kitzeslab/bioacoustics-model-zoo:{tag}", trust_repo=True, **kwargs
     )
@@ -45,10 +45,15 @@ def load(model, tag=None, **kwargs):
     (see https://github.com/kitzeslab/bioacoustics-model-zoo landing page for
     detailed instructions)
 
+    > Note on Error "module not found: bioacoustics_model_zoo" when using multiprocessing (num_workers>0):
+    if you get an error to this effect, please install the bioacoustics_model_zoo as a package in your python environment:
+    > `pip install git+https://github.com/kitzeslab/bioacoustics-model-zoo@0.11.0.dev1`
+    > as the torch.hub api seems to have trouble with multiprocessing for some model classes.
+
     """
     if tag is None:
         # in the future, might be based on opensoundscape.__version__
-        tag = "0.11.0"
+        tag = "0.11.0.dev1"
     return torch.hub.load(
         f"kitzeslab/bioacoustics-model-zoo:{tag}", model, trust_repo=True, **kwargs
     )