Description
What is your issue?
As part of the efforts described in #10039, I added #10088, and noticed the repo layout has arguably not kept up with the code growth over the past decade. This isn't the most pressing issue, but it does make the returns to refactors lower, since we're moving lines from 11K LOC files to 1K LOC files, rather than anything smaller.
(Even if you think LLMs aren't that useful / aren't going to get better / etc; these changes would still make the repo easier for people to navigate...)
In particular, 2/3 of our code is in xarray/core
— 66873 LOC vs 97118 LOC in xarray
I can imagine splitting this up into a few categories:
- compat —
dask_array_*
,npcompat
,pdcompat
,array_api_compat
- compute / computation —
computation
,arithmetic
,nanops
,weighted
, thecurvefit
that's currently indataset
,rolling
,rolling_exp
, maybemissing
- reshape / align / merge (need a better name) —
merge
,alignment
,concat
I'd propose having each of those be paths within xarray/
. Then there's more freedom to make new files within those paths relative to the current state, where a new file means adding onto a very long list of files in xarray/core
.
I'm not confident on how much disruption that can cause to existing PRs. I think if we land them as commits which mostly just move the files, then git will mostly handle merges well. We can start slowly and see how it goes...