Skip to content

Conversation

@kysolvik
Copy link
Collaborator

  • Added type hints for all public and most non-public methods.
  • Updated docstrings to remove types (handled by hints) and fix some lingering outdated variable descriptions.
  • Did not update pyqg (non-jax) and barotropic data generators, since pyqg install is still broken. I would suggest that we remove them for now and revisit in the future if pyqg becomes actively maintained again. That can be a separate pull request.

…and specify when object is xarray dataset (_ds) or data array (_ar)
…ng until other parts of those are cleaned up
) -> ArrayLike:
if H is not None:
yb = H @ xb
Yb = H @ X0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did xb change to x0 here?

Args:
xb (ndarray): Forecast/background ensemble with shape
X0: Forecast/background ensemble with shape
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the forecast/background ensemble now called X0 instead of Xb?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I updated it as follows. ETKF and 3DVar are pretty straightforward:

  • xb/Xb = Background state
  • xa/Xa = Analysis

For 4dvar and 4dvar-BP, there's also:

  • xb0 = Background state at time 0 (in cycle).
  • xa0 = Analysis state at time 0.
  • x0 = Within the outer loop, this is the current best guess for x0. In the first loop, it is the background. After that, it is xa0 from the last loop.
  • X = Within the outer loop, this is the current forecast based on x0. One question: should this be upper-case, since it is a matrix of size (time_dim, system_dim)? Or does that make it seem like it is an ensemble? For example, see _var4d.py inner loop around line 140.

Copy link
Owner

@StevePny StevePny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have a couple questions/comments, but am ok with it being merged at your discretion.


def _step_forecast(self, xa, n_steps):
def _step_forecast(self,
Xa: XarrayDatasetLike,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is an xarray dataset, then it might be good to keep a convention where we use "_ds" at the end to make sure that is clear, e.g. Xa_ds.

for i in range(self.ensemble_dim):
cur_inputs, cur_forecast = self.model_obj.forecast(
xa.isel(ensemble=i),
Xa.isel(ensemble=i),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer the index to be "member=i" instead of "ensemble=i", but if that requires lots of changes then no worries on that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty quick fix! The only issue is that the whole xarray dimension would have to be renamed to "member", which in my opinion isn't as descriptive. Currently, the dimensions for an ensemble state like this are usually something like: ['time', 'system', 'ensemble'].

Another option could be calling the dimension "ensemble_member" or something like that. The final option might be to create an alias of sorts for the "ensemble" dimension which allows you to select from it using either "ensemble" or "member". I haven't fully looked into that, but there's a thread about that here: pydata/xarray#8570.

For now, I'll merge as is. But I can create a separate PR with those changes if we decide to rename the dimension.

) -> ArrayLike:
if H is not None:
yb = H @ xb
Yb = H @ Xb
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking - these are matrices, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are. Xb is (system_dim, ensemble_dim) and H is (observation_dim, system_dim)

system_dim: System dimension.
delta_t: The timestep of the model (assumed uniform)
model_obj: Forecast model object.
in_4d: True for 4D data assimilation techniques (e.g. 4DVar).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little confusing to me to see this in the Var3D method. How does this get used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this from the docstring to avoid confusion. All of the DA classes have this attribute, but it's set automatically. The parent class, DACycler, checks this attribute when your run the main .cycle() method, since for the 4D methods there's some extra calculations.

@kysolvik kysolvik merged commit f5f4473 into main Apr 4, 2025
1 check passed
@kysolvik kysolvik deleted the docs/type-hints branch April 4, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants