An open source AI model and interface for Earth.
Launch into a JupyterLab environment on
Binder | Planetary Computer | SageMaker Studio Lab |
---|---|---|
To help out with development, start by cloning this repo-url
git clone <repo-url>
Then we recommend using mamba to install the dependencies. A virtual environment will also be created with Python and JupyterLab installed.
cd model
mamba env create --file environment.yml
Activate the virtual environment first.
mamba activate claymodel
Finally, double-check that the libraries have been installed.
mamba list
This is for those who want full reproducibility of the virtual environment. Create a virtual environment with just Python and conda-lock installed first.
mamba create --name claymodel python=3.11 conda-lock=2.5.1
mamba activate claymodel
Generate a unified conda-lock.yml
file
based on the dependency specification in environment.yml
. Use only when
creating a new conda-lock.yml
file or refreshing an existing one.
conda-lock lock --mamba --file environment.yml --platform linux-64 --with-cuda=12.0
Installing/Updating a virtual environment from a lockile. Use this to sync your
dependencies to the exact versions in the conda-lock.yml
file.
conda-lock install --mamba --name claymodel conda-lock.yml
See also https://conda.github.io/conda-lock/output/#unified-lockfile for more usage details.
mamba activate claymodel
python -m ipykernel install --user --name claymodel # to install virtual env properly
jupyter kernelspec list --json # see if kernel is installed
jupyter lab &
The neural network model can be ran via LightningCLI v2. To check out the different options available, and look at the hyperparameter configurations, run:
python trainer.py --help
python trainer.py test --print_config
To quickly test the model on one batch in the validation set:
python trainer.py validate --trainer.fast_dev_run=True
To train the model for a hundred epochs:
python trainer.py fit --trainer.max_epochs=100
To generate embeddings from the pretrained model's encoder on 1024 images (stored as a GeoParquet file with spatiotemporal metadata):
python trainer.py predict --ckpt_path=checkpoints/last.ckpt \
--data.batch_size=1024 \
--data.data_dir=s3://clay-tiles-02 \
--trainer.limit_predict_batches=1
More options can be found using python trainer.py fit --help
, or at the
LightningCLI docs.