Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change syntax to get NC data #395

Merged
merged 3 commits into from
Nov 10, 2023
Merged

Change syntax to get NC data #395

merged 3 commits into from
Nov 10, 2023

Conversation

charleskawczynski
Copy link
Member

Closes #394.

@charleskawczynski
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request Nov 2, 2023
Copy link
Contributor

bors bot commented Nov 2, 2023

try

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@charleskawczynski charleskawczynski marked this pull request as ready for review November 2, 2023 16:34
@Sbozzolo
Copy link
Member

Sbozzolo commented Nov 2, 2023

My understanding is that the interface was designed to be agnostic of the dataloader. Is this still the case with this change? E.g., would if work with HDF5 instead of NCDatasets?

@Sbozzolo
Copy link
Member

Sbozzolo commented Nov 2, 2023

Moreover, this comment may be important:

While fixing up my code I noticed that Array(v_cf) is not equivalent to v_cf.var[:] even in the vector case, because Array(v_cf) does Missing and DateTime conversion etc. The equivalent would be Array(v_cf.var). 

Originally posted by @mwlidar in Alexander-Barth/NCDatasets.jl#233 (comment)

I don't know if it applies to us.

@charleskawczynski
Copy link
Member Author

My understanding is that the interface was designed to be agnostic of the dataloader. Is this still the case with this change? E.g., would if work with HDF5 instead of NCDatasets?

These changes are orthogonal to RRTMGP, since it (thankfully) doesn't depend on NCDatasets. If we decide to use HDF5 on the user side (in ClimaAtmos), then RRTMGP doesn't need to know. We could change it in RRTMGP's test/perf environments, but I figured the simple thing for now is to make RRTMGP compatible with the latest NCDatasets version so that things don't get stale.

@charleskawczynski
Copy link
Member Author

because Array(v_cf) does Missing and DateTime conversion etc.... I don't know if it applies to us.

Yeah, I don't think we need to worry about that, since this really depends on the data, and our NC data apparently doesn't have missing / datetime stuff.

@Sbozzolo
Copy link
Member

Sbozzolo commented Nov 2, 2023

My understanding is that the interface was designed to be agnostic of the dataloader. Is this still the case with this change? E.g., would if work with HDF5 instead of NCDatasets?

These changes are orthogonal to RRTMGP, since it (thankfully) doesn't depend on NCDatasets. If we decide to use HDF5 on the user side (in ClimaAtmos), then RRTMGP doesn't need to know. We could change it in RRTMGP's test/perf environments, but I figured the simple thing for now is to make RRTMGP compatible with the latest NCDatasets version so that things don't get stale.

I don't think the problems are completely orthogonal.

Consider for example the lines

kmajor = FTA4D(Array(ds["kmajor"]))
kminor_lower = FTA3D(Array(ds["kminor_lower"]))
kminor_upper = FTA3D(Array(ds["kminor_upper"]))
kminor_start_lower = IA1D(Array(ds["kminor_start_lower"]))
kminor_start_upper = IA1D(Array(ds["kminor_start_upper"]))

Whatever ds is, Array(ds["kmajor"]) has to return a four dimensional array of floats. Similarly for the other lines. So, there's a coupling between the reader and RRTMGP (which is why it broke when NCDatasets changed its API). My comment was more along the lines of "is this syntax unique to NCDatasets, or it agreed upon in the community"? If this syntax is unique to NCDatasets, then we are essentially boxing ourselves to using it. However, as I understand it, NCDatasets changed the APIs to be more compatible with the DiskArray package, so hopefully this change will give us more compatibility. It'd be nice to check if Array(HDF5["var"]) implements the same behavior.

@charleskawczynski
Copy link
Member Author

Whatever ds is, Array(ds["kmajor"]) has to return a four dimensional array of floats...

Ah, whoops, I didn't even realize I was making changes to src. I see. I haven't looked closely enough, but it would be nice if we could move this logic outside of RRTMGP.

My comment was more along the lines of "is this syntax unique to NCDatasets, or it agreed upon in the community"? If this syntax is unique to NCDatasets, then we are essentially boxing ourselves to using it. However, as I understand it, NCDatasets changed the APIs to be more compatible with the DiskArray package, so hopefully this change will give us more compatibility. It'd be nice to check if Array(HDF5["var"]) implements the same behavior.

Agreed.

@Sbozzolo
Copy link
Member

Sbozzolo commented Nov 2, 2023

My comment was more along the lines of "is this syntax unique to NCDatasets, or it agreed upon in the community"? If this syntax is unique to NCDatasets, then we are essentially boxing ourselves to using it. However, as I understand it, NCDatasets changed the APIs to be more compatible with the DiskArray package, so hopefully this change will give us more compatibility. It'd be nice to check if Array(HDF5["var"]) implements the same behavior.

Agreed.

I checked with HDF5.jl and found that Array(ds["var"]) returns a multidimensional array as in NCDatasets.

@Sbozzolo Sbozzolo self-requested a review November 2, 2023 23:08
@Sbozzolo
Copy link
Member

Any reason not to merge this?

@charleskawczynski
Copy link
Member Author

No, i think we can merge

@Sbozzolo
Copy link
Member

Buildkite didn't start, so we probably haven't set it right. Let me look at that

@Sbozzolo
Copy link
Member

Okay, it was filtering the branches. Now it works. I also bumped the version so that we can release the fixed version.

@Sbozzolo Sbozzolo merged commit 16f5e2b into main Nov 10, 2023
9 checks passed
@charleskawczynski charleskawczynski deleted the ck/nc branch May 8, 2024 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update NCDatasets compats in environments
2 participants