Skip to content

Conversation

@maxrjones
Copy link
Member

@maxrjones maxrjones commented Nov 21, 2025

Not quite ready for a review yet, will resume work on this ~Dec 1


🔍 Preview: https://geojupyter-workshop-open-source-geospatial--44.org.readthedocs.build/
Note: This Pull Request preview is provided by ReadTheDocs. Our production website, however, is currently deployed with GitHub Pages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having dependencies specified for each module, but I think logistically, it makes the most sense to toss these in to one big conda environment users will use for the whole workshop.

@mfisher87
Copy link
Member

mfisher87 commented Nov 24, 2025

A couple notes as I'm reading:

  • Let's explain the %%time magic, maybe using a glossary entry?
  • Let's define "data-proximate computing" in the glossary. Provide some AKAs like "Edge computing" or "near-data computing"
  • "Cloud native data are structured for efficient querying across the network" -- better to say "over the Internet"? Network is a very general term and while it's probably more correct (sometimes we're querying cloud native data over a private network), it may be harder to internalize? Not really sure, just a feeling.
  • "This took a lot of time to open the file." How much time? Maybe an order of magnitude, like "It took over 1 second to open this file" and contrast with the faster approach as "Now we can open the file in milliseconds". I observed about a 4x wall time speedup on a tiny 2GB/0.5CPU node.
  • Let's define "globbing". In the glossary perhaps?
  • We could more explicitly state that the behavior of reading the entire file in to memory is triggered by switching from ObstoreReader to ObstoreMemCacheReader. That's shown in the code example, but to know this the reader will have to rotely compare the two. It's not hard, but it's a tiny bit of cognitive load we can save.
  • How do you feel about removing any assignments that are only used once, e.g. bucket and path? I think this would also be a tiny cognitive load savings.

I did some tweaks that I felt confident were uncontroversial, and I'm going to merge what we have!

The prompt "List the files available following at this prefix on AWS S3
storage" left me expecting the list of files to be output. If we intend
to not output it we could instead say "Create a list of the files..."?
@mfisher87
Copy link
Member

mfisher87 commented Nov 24, 2025

Do you want this notebook to show up as executed in the course materials website? Or do you want the participants to execute it themselves to see the results?

@mfisher87
Copy link
Member

Not quite ready for a review yet, will resume work on this ~Dec 1

I apologize, I didn't read this carefully :D I'll make it a draft PR.

@mfisher87 mfisher87 marked this pull request as draft November 24, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants