Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline DeepBedMap model tuning, evaluation and Antarctic-wide DEM generation #156

Closed
wants to merge 5 commits into from

Conversation

weiji14
Copy link
Owner

@weiji14 weiji14 commented Jun 24, 2019

Towards making a new full DeepBedMap of Antarctica, replacing the one in #136. There has been lots of significant changes to our input training datasets (e.g. #146, #150, #155), to our ESRGAN model (e.g. #151) plus software related changes since v0.8.0. Some of the scripts run slower and struggle to handle our big datasets, but they have become more geographically accurate and is now ready for prime time. Let's iron those problems out and make a new 250m spatial resolution bed elevation map of Antarctica!!!

deepbedmap3_300dpi_v0 9 2

TODO:

Enable more concurrent runs on 4 GPUs instead of just 2! In order to do so srgan_train.get_deepbedmap_test_result was refactored in such a way that we do not risk saving and reloading the model weights to/from the same file (that might actually clash when 2 processes are running, more so if 4 are doing so)! The best RMSE_test result achieved from the last hyperparameter tuning frenzy is 29.90 at https://www.comet.ml/weiji14/deepbedmap/fd658ce06e81492ea5a6f4b5e1afa028 but that might be severely overfitted, 2nd best achieved is 38.50 at https://www.comet.ml/weiji14/deepbedmap/abc3af8e9abc4080a6b5b44b33c537c2 which might be the one we'll actually use.

This commit extends the dual-GPU functionality introduced in a2866b6. Since our GPUs are on different servers (2x Tesla V100s on tara, 2x Tesla P100s on kahutea), this relies on having GMT==6.0.0rc1 installed from conda (1edb16e) instead of compiling from source as the latter would mean GMT can only work on one server. The model weights are saved to unique temporary folders while training and are not actually loaded by get_deepbedmap_test_result (i.e. we just use the model directly since it is trained already...). Also made up a different TPE seed for each device/GPU based on len(hostname) + $CUDA_VISIBLE_DEVICE, very hacky I know. There were a lot of hyperparameter settings I've tried over the weekend on the new quilt hash 0734959aa4f4903a17ed2acdfd53b3c0c826aadfc718e5fdd3c1b04963e1206e training tiles. The final tuning frenzy involved ~25 experiments each on 4 GPUs with this configuration: residual_scaling between 0.15 and 0.30, learning_rate between 6.5e-5 and 8.5e-5, num_epochs between 60 and 90. These floating point hyperparameters are actually a problem for Optuna, see https://0.30000000000000004.com/.
@weiji14 weiji14 added enhancement ✨ New feature or request model 🏗️ Pull requests that update neural network model labels Jun 24, 2019
@weiji14 weiji14 added this to the v0.9.2 milestone Jun 24, 2019
@weiji14 weiji14 self-assigned this Jun 24, 2019
@review-notebook-app
Copy link

Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/156

You'll be able to see visual diffs and write comments on notebook cells. Powered by ReviewNB.

So that we don't have to worry about using the wrong Generator model hyperparameter settings, we now directly download the ESRGAN model's weights and hyperparameter information from Comet.ML in deepbedmap.ipynb! This commit builds upon 77b4fe1 where we refactored features/environment.py's _download_deepbedmap_model_weights_from_comet to get an arbitrary Comet.ML experiment model's weights. The function's name is now shortened to _download_model_weights_from_comet, and made to return num_residual_blocks and residual_scaling hyperparameters so we know how to build the model.

Also updated snapshot on 2007tx.nc test area prediction (see previous ones at 77b4fe1 and 75266fc) using the trained model at https://www.comet.ml/weiji14/deepbedmap/abc3af8e9abc4080a6b5b44b33c537c2 giving an RMSE_test of 38.50.
@weiji14 weiji14 force-pushed the model/retune_on_round_grids branch from dcb0145 to 8362dbd Compare June 25, 2019 09:18
Not sure why the deepbedmap.feature integration is failing (just gets stuck for >10min) since it works on the server and even on my old laptop in a docker container! Maybe because we're downloading the .npz model weights file twice? Removed the download part in the deepbedmap integration test fixture. Also re-upload test tiles covering the new 2007tx.nc rounded to 250 area.
@weiji14 weiji14 force-pushed the model/retune_on_round_grids branch from 8362dbd to 8ed1273 Compare June 25, 2019 09:49
@weiji14 weiji14 changed the title Re-tune, Re-evaluate, Re-create DeepBedMap model of Antarctica Streamline DeepBedMap model tuning, evaluation and Antarctic-wide DEM generation Jun 25, 2019
weiji14 added 2 commits June 25, 2019 13:27
Quick update of Pine Island Glacier prediction from e8ae274. By evaluating on the new rounded grids, the bicubic baseline RMSE value has dropped from 72.66 to 67.12 now, making our new ESRGAN model's RMSE of 63.46 look pretty insignificant. The 2007tx/2010tr/istarxx grid combination has a new slice now, and since they all have nicely rounded coordinates, we can pygmt.grdtrack the merged xarray.DataArray grid with all the points in one go! This opens up the possibility to evaluate on other groundtruth tracks crossing the Pine Island Glacier area, but having peeked at those results, I feel like there's really really really a lot more work to do...
New DeepBedMap DEM! Compare this version with the EGU version at 7a5d223 or the better v0.8.0 version at 58d8ebd. There's been a couple of hacks needed to get the full continent to come out nicely, notably by clipping the MEASURES_Ice_Velocity/W2_tile layer to a minimum value of 0.0. The newer data_prep.selective_tile script from 4a074d9 was too memory and CPU intensive for handling REMA/W1_tile (believe me, I've tried dask.distributed, all sorts of parallelization, etc on a 80 core, 200GB RAM server) so we're bringing back the old selective_tile just for that crazy layer.

Also note that the refactored data_prep.selective_tile function's gapfill_raster_filepath is now renamed to 'gapfiller' as it can take in either a string filepath to a raster file, or a floating point number to be used to fill in the blank spaces! In deepbedmap.ipynb, we use this 'gapfiller' to arbitrarily fill in BEDMAP2/X_tile with -5000.0 and Arthern Accumulation/W3_tile with 0.0, noting though that this is just for cosmetic purposes as data gaps are in the ocean area outside of DeepBedMap's intended domain (within the grounding line). Another tweak we've made on the cosmetic front is in changing the colormap from BrBG_r to Blues_r that fits with the one on the README (produced using QGIS with an additional hillshading layer), oh and yes, we've updated the README.md DeepBedMap DEM snapshot too!
@weiji14 weiji14 closed this in 293aea9 Jul 15, 2019
@weiji14 weiji14 deleted the model/retune_on_round_grids branch July 15, 2019 20:57
@weiji14 weiji14 restored the model/retune_on_round_grids branch July 15, 2019 20:57
@weiji14 weiji14 deleted the model/retune_on_round_grids branch July 15, 2019 21:00
weiji14 added a commit that referenced this pull request Aug 29, 2019
Setting fire to all that code to gapfill a raster with another raster as it's very messy, and we only need it for REMA now since we are no longer gapfilling MEaSUREs with #165 merged in. Still keeping the option to gapfill with a single floating point number, but we're removing the selective_tile_old function that has sat aside selective_tile since aac21fb in #156 of v0.9.2. Temporarily using a bilinear resampled 200m REMA in deepbedmap.ipynb. Will follow up with code to produce a gapfilled 100m resolution REMA geotiff!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request model 🏗️ Pull requests that update neural network model
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant