You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using xarray.open_mfdataset() to open tens of thousands of (fairly small) netCDF files, and it's taking quite some time. Being of an impatient nature, I would like to at least be assured that something is happening, so a progress bar would be nice. I found an example of using a progress bar from dask here: #4000 (comment)
However, my attempt to adapt this solution doesn't show a progress bar. Any other options?
Here is the code I tried:
from dask.diagnostics import ProgressBar
with ProgressBar():
d = xr.open_mfdataset('proc/*.nc')
Describe the solution you'd like
I'd like to see a nice and fairly minimal progress bar, for example telling me how many files have been dealt with so far.
Describe alternatives you've considered
Something based on tqdm would be nice, but could also be something else.
Additional context
No response
The text was updated successfully, but these errors were encountered:
After discussion with a colleague, we ended up with this solution:
import xarray as xr
from dask.diagnostics import ProgressBar
with xr.open_mfdataset('proc/*.nc', chunks=dict(index=1)) as d, ProgressBar():
d.load()
This works in the strict sense that it displays a progress bar, but unfortunately it does nothing (no progress bar visible) for a couple of minutes (for the set of files I tested), and then the progress bar shows up and runs through in a few seconds. In other words, not very useful for an impatient soul like me.
I should add that I'm testing this in a jupyter notebook.
indeed, this does nothing if you don't pass parallel=True to open_mfdataset. What that does is parallelize the access to each file by creating one dask task per open_dataset on each file. Without it, open_dataset is called on each file in sequence without going through dask, so you don't get any feedback from dask.
The activity on the progress bar you get is the loading of each chunk into memory, which happens when you call d.load(), and so after the call to open_mfdataset.
Is your feature request related to a problem?
I'm using
xarray.open_mfdataset()
to open tens of thousands of (fairly small) netCDF files, and it's taking quite some time. Being of an impatient nature, I would like to at least be assured that something is happening, so a progress bar would be nice. I found an example of using a progress bar from dask here: #4000 (comment)However, my attempt to adapt this solution doesn't show a progress bar. Any other options?
Here is the code I tried:
Describe the solution you'd like
I'd like to see a nice and fairly minimal progress bar, for example telling me how many files have been dealt with so far.
Describe alternatives you've considered
Something based on tqdm would be nice, but could also be something else.
Additional context
No response
The text was updated successfully, but these errors were encountered: