-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial indexing of a Panel #8906
Comments
I've just hit this too, on 0.16.2. Is this intended? Is it related to #11369? In [8]: panel = pd.Panel(pd.np.random.rand(2,3,4))
In [10]: panel.shape
Out[10]: (2, 3, 4)
In [11]: panel[:, :, 0].shape
Out[11]: (3, 2) In numpy: In [15]: npanel=pd.np.random.rand(2,3,4)
In [16]: npanel.shape
Out[16]: (2, 3, 4)
In [18]: npanel[:,:,0].shape
Out[18]: (2, 3) CC @jreback, as this seemed like an abandoned issue |
yes this has always been like this. |
This is a bigger issue than one we're going to solve here. But regardless a couple of points: Panels generally
Panel indexing
xray mostly has the design I expected, I think, although does remember the collapsed dimension: In [22]: panel_x=xray.DataArray(pd.np.random.rand(4,3,2))
In [24]: panel_x
Out[24]:
<xray.DataArray (dim_0: 4, dim_1: 3, dim_2: 2)>
array([[[ 0.81499518, 0.73722039],
...
[ 0.21864764, 0.93710684]]])
Coordinates:
* dim_0 (dim_0) int64 0 1 2 3
* dim_1 (dim_1) int64 0 1 2
* dim_2 (dim_2) int64 0 1
In [25]: panel_x.loc[:,0,:]
Out[25]:
<xray.DataArray (dim_0: 4, dim_2: 2)>
array([[ 0.81499518, 0.73722039],
[ 0.41809174, 0.28529916],
[ 0.82198192, 0.14365383],
[ 0.55948113, 0.24809068]])
Coordinates:
* dim_0 (dim_0) int64 0 1 2 3
dim_1 int64 0
* dim_2 (dim_2) int64 0 1 |
Well, @shoyer and I had some discussions w.r.t. essentially making That's an option; more closely aligns pandas and x-ray. However, I think is a nice use case for a dense I happen to (well in the past), used
So maybe you can elaborate where you think pandas is lacking (in docs/tests/etc). Pretty much everything is there. So asside from the indexing conventions, not sure what issues there are. |
Here are a couple of issues I've had in addition to the above; I can provide more on these / others if helpful:
In [56]: x
Out[56]:
<xray.DataArray (dim_0: 2, dim_1: 3, dim_2: 4)>
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Coordinates:
* dim_0 (dim_0) int64 0 1
* dim_1 (dim_1) int64 0 1 2
* dim_2 (dim_2) int64 0 1 2 3
In [57]: x * pd.np.asarray([0,1])[:, pd.np.newaxis, pd.np.newaxis]
Out[57]:
<xray.DataArray (dim_0: 2, dim_1: 3, dim_2: 4)>
array([[[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Coordinates:
* dim_1 (dim_1) int64 0 1 2
* dim_2 (dim_2) int64 0 1 2 3
* dim_0 (dim_0) int64 0 1
In [58]: x.to_pandas() * pd.np.asarray([0,1])[:, pd.np.newaxis, pd.np.newaxis]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-58-18d40558bcd9> in <module>()
----> 1 x.to_pandas() * pd.np.asarray([0,1])[:, pd.np.newaxis, pd.np.newaxis]
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/ops.py in f(self, other)
1050 raise ValueError('Simple arithmetic with %s can only be '
1051 'done with scalar values' %
-> 1052 self._constructor.__name__)
1053
1054 return self._combine(other, op)
ValueError: Simple arithmetic with Panel can only be done with scalar values
panel.loc[:, :, :] = pd.np.where(
panel.notnull(),
panel,
fallback_df[:, :, pd.np.newaxis]
) xray seems decent at this too: In [61]: x.where(x>5)
Out[61]:
<xray.DataArray (dim_0: 2, dim_1: 3, dim_2: 4)>
array([[[ nan, nan, nan, nan],
[ nan, nan, 6., 7.],
[ 8., 9., 10., 11.]],
[[ 12., 13., 14., 15.],
[ 16., 17., 18., 19.],
[ 20., 21., 22., 23.]]])
Coordinates:
* dim_1 (dim_1) int64 0 1 2
* dim_2 (dim_2) int64 0 1 2 3
* dim_0 (dim_0) int64 0 1
In [62]: x.where(x[0]>5)
Out[62]:
<xray.DataArray (dim_0: 2, dim_1: 3, dim_2: 4)>
array([[[ nan, nan, nan, nan],
[ nan, nan, 6., 7.],
[ 8., 9., 10., 11.]],
[[ nan, nan, nan, nan],
[ nan, nan, 18., 19.],
[ 20., 21., 22., 23.]]])
Coordinates:
* dim_1 (dim_1) int64 0 1 2
* dim_2 (dim_2) int64 0 1 2 3
* dim_0 (dim_0) int64 0 1 Hope this is helpful - thanks for your engagement @jreback |
Yes, these sorts of issues are exactly why we wrote xray in the first place. The pandas API and internals weren't really designed with n-dimensional data in mind, which makes panels and nd-panel quite awkward.
The collapsed dimension is essentially just metadata and can be safely ignored. I think @jreback was a little confused here, but scalar coordinates are not used for any sort of alignment. IMO the xray.DataArray is almost strictly more useful the panels. The main feature gap is that we currently don't support MultiIndex in xray, but hopefully that will change soon. |
since I understand you recently switched from using if we deprecate |
Sure - I'll give a short synthesis, and happy to answer any follow up questions you have. Good:
Bad - minor, and very specific to my experience:
Overall it's a beautiful library, both for exploratory work and for production. I'm very excited to be using it, and grateful to @shoyer for creating it. I don't have a strong view on whether we should make Let me know if I can help beyond this at all, |
The good news is that almost all of @MaximilianR's issues should be fixable with a bit more work -- there are no fundamental design issues. For example, I just made a PR adding MultiIndex support (pydata/xarray#702).
Could you share an example where this fails? There may be a bug here -- we've had support for string indexing of datetime indexes since almost the beginning: http://xray.readthedocs.org/en/stable/time-series.html#datetime-indexing |
That should read In [51]: ds=xray.Dataset(coords={'date':pd.period_range(periods=10,start='2000')})
In [52]: ds['d']=('date', pd.np.random.rand(10))
In [53]: ds.sel(date='2000')
Out[53]:
<xray.Dataset>
Dimensions: ()
Coordinates:
date object 2000-01-01
Data variables:
d float64 0.8965 Confirming it works for In [54]: ds=xray.Dataset(coords={'date':pd.date_range(periods=10,start='2000')})
In [55]: ds['d']=('date', pd.np.random.rand(10))
In [56]: ds.sel(date='2000')
Out[56]:
<xray.Dataset>
Dimensions: (date: 10)
Coordinates:
* date (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
Data variables:
d (date) float64 0.09303 0.5456 0.4934 0.08438 0.1854 0.2823 ... |
closing as Panels are deprecated |
See also: http://stackoverflow.com/questions/26736745/indexing-a-pandas-panel-counterintuitive-or-a-bug
These are actually two related(?) issues.
The first is that the DataFrame is transposed, when you index the major_indexer or minor_indexer:
This may be a design choice, but it seems counterintuitive to me and it is not in line with the way numpy indexing works.
On a related note, I would expect the following two commands to be equivalent:
INSTALLED VERSIONS
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: nl_NL
pandas: 0.15.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.2
patsy: 0.2.1
dateutil: 1.5
pytz: 2014.9
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.2
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: