One topic I am passionate about is the environment, especially the impact that climate change has on our natural world and standard of living. To get an idea of what kind of climate related datasets were out there, I scrubbed Kaggle.com for high quality datasets that involved the environment. A couple datasets caught my attention because they were so close to home. The two datasets were United States wildfires over a 24 year period and United States droughts and soil conditions over a 20 year period. I live in the central valley of California (US) where every year the fires in the hills on either side of the valley become worse and worse, creating horrible air quality and destroying the homes and forests. I am interested in predicting when and where wildfires will occur next. Identifying these locations could lead to better fire preparation and population planning.
- Google Cloud Run at https://wildfire.eerichmond.com
- API Docs
- Install Anaconda
conda create -name wildfire python=3.9
conda activate wildfire
brew install cmake
pip install -r requirements.txt
- Install gcloud
gcloud auth application-default login
yarn --cwd ./app/ build
uvicorn app.main:app --reload
coverage run --source=./app/ -m pytest -v && coverage report
- Watch tests
ptw --runner pytest
- Generate coverage badge
coverage-badge -f -o coverage.svg
- Download fires.sqlite from Google Cloud Storage (19GB) to
./data/fires.sqlite
conda activate ml-wildfire
python -m app.trainer.export
to generateX_train.npy, X_test.npy, y_train.npy, y_test.npy, scalar.pickle
numpy array binaries. This is a separate steps because it takes 3+ hours to turn the ~27 million geolocated weather points into a 13GBX_train.npy
python -m app.trainer.train xgb
to generate theapp/models/xgb_model.pickle
- Google Cloud Dashboard
- Edit
gcp_setup.sh
andbuild.yml
- Replace the Google account number (
644348144159
) and project ID (strong-maker-345805
) with your own. - Update the docker registry locations
ghcr.io/eerichmond/ml-wildfire-prediction:latest
andus-west1-docker.pkg.dev/strong-maker-345805/ml-wildfire/ml-wildfire:latest
with your own.
- Replace the Google account number (
- Run
sh ./gcp_setup.sh
to create theml-wildfire
Google Cloud Run service and Google Artifact Registry
On every git push, GitHub Actions build.yml will:
- Install and test the Python app
- Build and push the Docker image to GitHub Container Registry and Google Artifact Registry
- Deploy the
:latest
Docker image to Google Cloud Run
- 3 zipped csv files with 23,841,471 records
- License CC0: Public Domain
- Drought notebooks
- Harmonized World Soil Database
- Newest version of the data (up to 2018) at US Forest Service
- License CC0: Public Domain