Skip to content

First meeting of the machine learning working group #532

@nbren12

Description

@nbren12

cc @rabernat @jhamman @mrocklin @arokem @shoyer

Today, we spent some time in the weekly check-in meeting (#527 ) discussing how pangeo can support machine learning workflows. Overall, there is a lot of interest, and we just barely cracked the surface. To further discuss, we agreed to start a machine learning working group, and I think we should set up a video-call within the next couple of weeks. If you would like to participate, please fill in your availability on this doodle poll: https://doodle.com/poll/5f6x8dinmu5eetmn

I was not able to find github user names for everyone on today's call. Would someone be able to @ the people I missed (eg. Alando, Jim)?

As an agenda, we could build off the topics we mentioned today:

  • Example workflows for climate ML - jupyter vs command line
  • Data preprocessing and data loading
  • Data structures (e.g. labeled arrays for deep learning, interoperability of numpy-like libraries)
  • Infrastructure - GPUs, data storage
  • Automation and reproducibility:
    • hyper-parameter tuning
    • workflow engines: snakemake, pachyderm
    • experiment loggers: sacred

As @mrocklin brought up, I think we should figure which of these items are general enough to lie within the scope of pangeo. Having a list of existing or envisioned machine learning workflows would help us do this. It seems that Anaconda+NASA has compiled a nice set of tutorials in the EarthML project. For my part, I recently gave an ML tutorial using binder.pangeo.io that went really smoothly.

What else do you all think we should discuss?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions