-
Notifications
You must be signed in to change notification settings - Fork 20
Replies: 4 comments · 13 replies
-
Nobody is dumb. Bugs exist! Thanks for bringing this up. |
Beta Was this translation helpful? Give feedback.
All reactions
-
😄 1
-
Agree this is not dumb. I think the problem is that your |
Beta Was this translation helpful? Give feedback.
All reactions
-
Something that has been bothering me a little bit is the confusion between the nested simulations. Eg here both When we designed this feature, we rationalized that this is the right design because it creates the possibility that the ocean model evolves on a different time-step than the coupled model. In other words It still seems like the right design as I write this, but it's clear that we still have some work to do to make checkpointing and output make sense with |
Beta Was this translation helpful? Give feedback.
All reactions
-
As noted before I think this is bonafide a missing feature. However you can work around it by using latest_checkpoint = "model_checkpoint_iteration13920.jld2"
set!(ocean.model, latest_checkpoint) For the coupled simulation, I think you also will need to manually update the clock. This might do: simulation.model.clock = ocean.model.clock Or you can update the Thank you for bringing this up! |
Beta Was this translation helpful? Give feedback.
All reactions
-
This will work great for now! Thanks! |
Beta Was this translation helpful? Give feedback.
All reactions
-
When I do
I get
|
Beta Was this translation helpful? Give feedback.
All reactions
-
If still tried to run (without updating the model clock since this failed):
and it looks like
However, the overwritten snapshot fields still have the correct time stamp, starting with
This behavior was probably clear to you anyway, but I'm still documenting it here (partly for myself). |
Beta Was this translation helpful? Give feedback.
All reactions
-
Can you show the whole stacktrace of the error? I have a vague idea of what to do to fix this bug but it would help to see the whole error! |
Beta Was this translation helpful? Give feedback.
All reactions
-
The error I get after
is
|
Beta Was this translation helpful? Give feedback.
All reactions
-
The error I get after
is
|
Beta Was this translation helpful? Give feedback.
All reactions
-
I think @simone-silvestri must have some secrets to share, because I believe he does some kind of manual checkpointing. Also @simone-silvestri do you have a system for checkpointing when running distributed across multiple GPUs? |
Beta Was this translation helpful? Give feedback.
All reactions
-
you mean you add |
Beta Was this translation helpful? Give feedback.
All reactions
-
yep, `filename * "_$(arch.local_rank).jld2". It is not really a sustainable solution. I was thinking to revamp CliMA/Oceananigans.jl#3429 so we do not need to do that. |
Beta Was this translation helpful? Give feedback.
All reactions
-
isn't it a pretty simple change just to add |
Beta Was this translation helpful? Give feedback.
All reactions
-
I don't think we need any revamping. I thikn just a few extra lines are needed to add something like |
Beta Was this translation helpful? Give feedback.
All reactions
-
I think so. That should be enough |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The answer is probably yes, and I am just being dump. Here is what I do:
I run this notebook, but with checkpointing
This saves a series of checkpoint files in the working directory, with the latest being
model_checkpoint_iteration13920.jld2
.To pick up from this checkpoint, I re-run the same notebook up to the end of the section "Set up output writers" and then try a bunch of different things: First,
which gives me the error
even though I run the notebook from the same working directory. Next, I try
which gives me the error
Beta Was this translation helpful? Give feedback.
All reactions