-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFTimeIndex calendar in repr #4092
CFTimeIndex calendar in repr #4092
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aaronspring, sorry for taking a while to get back to you. These are good questions.
inheritance is disencouraged, how should I extend the
__repr__
coming frompd.Index
? Should I try to rebuildpd.Index
?
I'll admit, I'm not an expert in writing array-like reprs. I agree though that we should do what we can to avoid relying on private pandas API. Have you looked into what this would entail? Is there a big web of internal methods we'd have to copy over?
how to deal with
calendar=365(6)_day
which get internally converted intonoleap/all_leap
?
I think I am ok with this. My sense is that it is more important that indexes with the same date type have the same calendar
attribute than that the calendar
argument passed to cftime_range
is propagated verbatim to the index it produces. What do you think?
if you define a |
Yes. I will do that. My question was whether should replicate the pd.Index.repr or try to import or inherit as much as I can from pandas? |
Ah now I understand your questions. Thanks for clarifying.
Maybe start by making the smallest possible change to the pandas repr? EDIT: If we want to build our own repr, there are some helpful functions in |
I was hoping to inherit from pandas like: def __repr__(self):
super().__repr__()
return self.__repr__().strip(")")+f", calendar={self.calendar}')" But I will now try to rebuild as in |
Hello @aaronspring! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-07-19 17:01:25 UTC |
This doesnt change the html repr yet: when clicking on the data symbol, calendar is not shown in the printout. I will further try to fix this |
I do understand the meaning of this last sentence @spencerkclark
Should I rather test on cftimeindex created in a different way? |
Currently only the cftimeindex repr shows the calendar property. I aim to get it into the dataset/dataarray repr. would it be a good idea to modify |
I found a workaround with EDIT: I ensure now that Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping @aaronspring -- my apologies for being slow to respond! I appreciate the effort to go all the way toward reproducing the pandas repr. My remaining concern, and it is a little nit-picky, is that in its current form, there is no way to limit the width of the repr, e.g. with xarray.set_options(display_width=40)
(see below for example for an existing xarray repr). Do you think you might be able to enable that?
In [1]: import xarray as xr
In [2]: ds = xr.tutorial.open_dataset("rasm")
In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions: (time: 36, x: 275, y: 205)
Coordinates:
* time (time) object 1980-09-16 12:00:00 ... 1983-08-17 00:00:00
xc (y, x) float64 ...
yc (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
Tair (time, y, x) float64 ...
Attributes:
title: /workspace/jhamman/processed/R1002RBRxaaa01a/l...
institution: U.W.
source: RACM R1002RBRxaaa01a
output_frequency: daily
output_mode: averaged
convention: CF-1.4
references: Based on the initial model of Liang et al., 19...
comment: Output from the Variable Infiltration Capacity...
nco_openmp_thread_number: 1
NCO: "4.6.0"
history: Tue Dec 27 14:15:22 2016: ncatted -a dimension...
In [4]: xr.set_options(display_width=40)
Out[4]: <xarray.core.options.set_options at 0x7fbdc834a668>
In [5]: ds
Out[5]:
<xarray.Dataset>
Dimensions: (time: 36, x: 275, y: 205)
Coordinates:
* time (time) object 1980-09-16...
xc (y, x) float64 ...
yc (y, x) float64 ...
Dimensions without coordinates: x, y
Data variables:
Tair (time, y, x) float64 ...
Attributes:
title: /works...
institution: U.W.
source: RACM R...
output_frequency: daily
output_mode: averaged
convention: CF-1.4
references: Based ...
comment: Output...
nco_openmp_thread_number: 1
NCO: "4.6.0"
history: Tue De...
xarray/coding/cftimeindex.py
Outdated
len_item = 19 # length of one item in repr | ||
# shorten repr for more than 100 items | ||
max_width = (19 + 1) * 100 if len(self) <= 100 else 22 * len_item | ||
datastr = format_array_flat(self.values, max_width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think format_array_flat
is a good choice if we want a 2-line repr, but for including more values it might be cleaner to write our own logic, rather than add commas and line breaks afterwards. As you've picked up on, I think the fact that we can treat the length of each element of the repr in a cftime array as a constant simplifies things greatly.
This is of course ignoring the potential for five-digit years; however, we already assume we won't see those in at least one other place in xarray (partial string indexing). At some point it might be good address that, but I think for now it's ok to stick with the four-digit year assumption. Particularly here, I think the worst that would happen is that the repr might potentially be a few characters wider than the imposed limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, for now I use format_array_flat
and insert commata and linebreaks manually. Are you suggestion I should rather write a new function? This function would probably use much of the code of format_array_flat
.
concerning the 5digit years: xr.cftime_range(start='10000',periods=2)
fails with ValueError: no ISO-8601 match for string: 10000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now wrote a new function format_cftimeindex_array
like format_array_flat
. Hope this is what you were hoping to see. Both functions share much of the code especially in the beginning of the function. Should I refactor these shared code parts into a small function that both functions use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concerning the 5digit years: xr.cftime_range(start='10000',periods=2) fails with ValueError: no ISO-8601 match for string: 10000
Oh right, we use the same string parsing logic in cftime_range
as in partial datetime string indexing. I was thinking of dates one might read in from a file, which get decoded through cftime.num2date
. Anyway I acknowledge that is an issue we don't need to address at the moment!
I now wrote a new function format_cftimeindex_array like format_array_flat. Hope this is what you were hoping to see.
Sorry I was thinking something more along these lines for the code that formats the times (the rest of the repr can be added around what it generates):
CFTIME_REPR_LENGTH = 19
def format_row(times, indent=0, separator=", ", row_end=",\n"):
return indent * " " + separator.join(map(str, times)) + row_end
def format_times(
index,
max_width,
offset,
separator=", ",
first_row_offset=0,
intermediate_row_end=",\n",
last_row_end=""
):
n_per_row = max(max_width // (CFTIME_REPR_LENGTH + len(separator)), 1)
n_rows = int(np.ceil(len(index) / n_per_row))
representation = ""
for row in range(n_rows):
indent = first_row_offset if row == 0 else offset
row_end = last_row_end if row == n_rows - 1 else intermediate_row_end
times_for_row = index[row * n_per_row:(row + 1) * n_per_row]
representation = representation + format_row(
times_for_row,
indent=indent,
separator=separator,
row_end=row_end
)
return representation
In other words iteratively generating each row in the repr, inserting the separator as you build each row, and inserting line breaks at the end of each row. I just find it fits in my head better than adding those elements post-hoc. I think you should be able to leverage the code above to construct a "split" repr as well (e.g. one that shows only the first and last 10 elements of the index) by calling format_times
twice with the appropriate arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. Now I think I get the idea...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the really nice template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented thanks to your nice template given above. ready for review @spencerkclark
now aligns to for dw in [40,60,80,120]:
with xr.set_options(display_width=dw):
print(time[:2],dw,'\n')
|
dont understand why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aaronspring; this is looking very close! Just a few more minor suggestions.
I hope this is the final one. all tests pass. implemented your suggestions. took me a few commits, but I learned a lot. thanks for the guidance @spencerkclark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aaronspring! I pushed a few more minor edits, mostly to spruce up the documentation a bit. Otherwise this looks good to me. Barring any comments from others I'll merge it sometime next week.
Co-authored-by: keewis <keewis@users.noreply.github.com>
Thanks again @aaronspring! |
Thanks @aaronspring this is a great contribution. |
My pleasure. Learnt a lot. And it was a long-standing issue that was mentioned in a medium article about xarray looking for more contributors. |
* upstream/master: Added xarrays-spatial and updated geoviews link (pydata#4262) update docs to point to xarray-contrib and xarray-tutorial (pydata#4252) Add release summary, some touch-ups (pydata#4217) CFTimeIndex calendar in repr (pydata#4092) fix the RTD timeouts (pydata#4254) update isort CI and pre-commit hook (pydata#4204)
isort -rc . && black . && mypy . && flake8
whats-new.rst
for all changes andapi.rst
for new APIDone:
calendar
property toCFTimeIndex