Extracting phenological information from digitized herbarium specimens.
Over the last few decades there has been a lot of effort to digitize herbarium specimens by photographing them and recording any of their annotations into databases, however, this effort has mostly been manual and labor-intensive resulting in only a fraction of herbarium specimens being fully annotated.
This project uses neural networks to automate the annotation of one set of biologically significant traits, relating to phenology: flowering, fruiting, and leaf-out.
The basic steps are:
- Obtain a database of herbarium images with corresponding annotations.
- We are using the iDigBio database for this.
- Clean and filter the iDigBio database to contain only angiosperm records with images.
- Find a subset of these records for training that meet the following criteria:
- It has an annotation of the presence or absence of at least one of the phenological traits. We use the spaCy library to mine the database's free text fields for these traits.
- It has an annotation of the specimen's phylogenetic order.
- It has exactly one image associated with the specimen. More than one image creates confusion as to which image contains the trait.
- Train a neural network(s) to recognize the traits. We are using the pytorch library to build the neural networks.
- Use the networks to annotate records.
This project extends Brian Stucky's work located here.
TODO: This is a bit complicated. Make an install script.
- Create a virtual environment
- Make sure you have a virtual environment manager installed. I use
virtualenv
.pip install --user virtualenv
- Check out a tag.
cd /path/to/herbarim_phenology
git checkout v0.1
(or another tag)
- Create a virtual environment.
1.
virtualenv -p python3.9 .venv
(You may use python 3.9+) 2.source ./.venv/bin/activate
- Install module requirements.
python -m pip install --upgrade pip setuptools wheel
python -m pip install -r requirements
- Download a vocabulary for spaCy.
python -m spacy download en_core_web_sm
- I have a module for common spacy functions, install that.
python -m pip install git+https://github.com/rafelafrance/traiter.git@master#egg=traiter
- Install the appropriate version of pytorch & pytorch vision. If your computer has an NVIDIA GPU I recommend the 1st line. If you do not have one, use the 2nd line.
python -m pip3 install -U torch==1.10.1+cu113 torchvision==0.11.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip3 install torch torchvision
- Create some useful directories.
mkdir -p data/models/flowering
mkdir -p data/models/fruiting
mkdir -p data/models/leaf_out
mkdir -p data/output
- Make sure you have a virtual environment manager installed. I use
- You need to do this every time you start using this module.
cd /path/to/herbarim_phenology
source ./.venv/bin/activate