Skip to content

Latest commit

 

History

History
36 lines (27 loc) · 1.99 KB

abstract.md

File metadata and controls

36 lines (27 loc) · 1.99 KB

Pangeo: Scalable Geoscience Tools in Python — Xarray, Dask, and Jupyter

Date and Time: Thursday, May 9, 2019

Location: Canberra, ACT

Speaker: James Munroe (@jmunroe)

Abstract: Earth scientists face serious challenges when working with large datasets. Pangeo is a rapidly growing community and software ecosystem for scalable geoscience based on open source scientific python. Pangeo’s three core packages are 1) Jupyter, a web-based tool for interactive computing, 2) xarray, a data-model and toolkit for working with N-dimensional labeled arrays, and 3) Dask, a flexible parallel computing library. When combined with distributed computing, these tools can help geoscientists perform interactive analysis on datasets up to petabytes in size. In this interactive, tutorial we will demonstrate how to employ this platform using real science examples from physical oceanography and hydrology. Participants will follow along using Jupyter notebooks to interact with xarray and Dask running on Amazon Web Services.

Workshop Agenda

(Tentative)

  • 0900-0930: Introduction to Pangeo Project and Software Ecosystem
  • 0930-1030: Hands-on interactive tutorial of xarray
  • 1030-1100: Break / Discussion
  • 1100-1200: Hands-on interactive tutorial of dask
  • 1200-1330: Lunch
  • 1330-1430: Getting started with ncloud-native data analyis
  • 1430-1500: Break / Discussion
  • 1500-1600: Deploying your own Pangeo platform on cloud or HPC computing resources

Learning Objectives: Participants will learn how to:

  • Recognize the software packages that comprise the Pangeo platform and explain how they work together
  • Load datasets using xarray from netCDF files, openDAP endpoints, and Zarr stores
  • Analyze data using xarray's label-based operations and groupby feature
  • Work with very large xarray datasets using Dask