-
Notifications
You must be signed in to change notification settings - Fork 188
Description
cc @rabernat @jhamman @mrocklin @arokem @shoyer
Today, we spent some time in the weekly check-in meeting (#527 ) discussing how pangeo can support machine learning workflows. Overall, there is a lot of interest, and we just barely cracked the surface. To further discuss, we agreed to start a machine learning working group, and I think we should set up a video-call within the next couple of weeks. If you would like to participate, please fill in your availability on this doodle poll: https://doodle.com/poll/5f6x8dinmu5eetmn
I was not able to find github user names for everyone on today's call. Would someone be able to @ the people I missed (eg. Alando, Jim)?
As an agenda, we could build off the topics we mentioned today:
- Example workflows for climate ML - jupyter vs command line
- Data preprocessing and data loading
- Data structures (e.g. labeled arrays for deep learning, interoperability of numpy-like libraries)
- Infrastructure - GPUs, data storage
- Automation and reproducibility:
- hyper-parameter tuning
- workflow engines: snakemake, pachyderm
- experiment loggers: sacred
As @mrocklin brought up, I think we should figure which of these items are general enough to lie within the scope of pangeo. Having a list of existing or envisioned machine learning workflows would help us do this. It seems that Anaconda+NASA has compiled a nice set of tutorials in the EarthML project. For my part, I recently gave an ML tutorial using binder.pangeo.io that went really smoothly.
What else do you all think we should discuss?