Skip to content

SparseArray is an ExtensionArray #22325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 236 commits into from
Oct 13, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
236 commits
Select commit Hold shift + click to select a range
ee187eb
wip
TomAugspurger Jul 12, 2018
32c1372
from scratch
TomAugspurger Jul 13, 2018
b265659
Updates
TomAugspurger Jul 13, 2018
8dfc898
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 13, 2018
9c57725
WIP
TomAugspurger Jul 13, 2018
13952ab
wip
TomAugspurger Jul 13, 2018
7a6e7fa
wip take
TomAugspurger Jul 13, 2018
1016af1
wip take
TomAugspurger Jul 16, 2018
072abec
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 22, 2018
0ad61cc
take
TomAugspurger Jul 22, 2018
5b0b524
take working
TomAugspurger Jul 22, 2018
224744a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 23, 2018
620b5fb
remove registry
TomAugspurger Jul 23, 2018
164c401
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 24, 2018
65f83d6
missing
TomAugspurger Jul 24, 2018
0b3c682
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Jul 27, 2018
69a5d13
wip ops
TomAugspurger Jul 27, 2018
f2b5862
More ops wip
TomAugspurger Jul 27, 2018
fa80fc5
segfault!
TomAugspurger Jul 28, 2018
3f20890
wip
TomAugspurger Jul 28, 2018
484adb0
start docs
TomAugspurger Jul 28, 2018
1df1190
2 failing extension tests
TomAugspurger Jul 30, 2018
4246ac4
wip fillna
TomAugspurger Jul 30, 2018
a849699
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 1, 2018
c4da319
registry dtype, asarray
TomAugspurger Aug 1, 2018
a2f158f
astype interface
TomAugspurger Aug 1, 2018
26b671a
"passing" extension tests
TomAugspurger Aug 1, 2018
375e160
no sparse block
TomAugspurger Aug 1, 2018
0a37050
wip
TomAugspurger Aug 2, 2018
3c2cb0f
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 2, 2018
27c6378
wip
TomAugspurger Aug 3, 2018
e52dae9
a bit on concat
TomAugspurger Aug 3, 2018
b6d8430
revert concat changes
TomAugspurger Aug 3, 2018
640c4a5
passing again
TomAugspurger Aug 3, 2018
6b61597
More concat
TomAugspurger Aug 3, 2018
427234f
fillna...
TomAugspurger Aug 3, 2018
e055629
wip
TomAugspurger Aug 6, 2018
a79359c
wip
TomAugspurger Aug 6, 2018
de3aa71
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 6, 2018
21f4ee3
reductions, ufuncs
TomAugspurger Aug 6, 2018
c1e594a
failing on ufuncs
TomAugspurger Aug 6, 2018
dc7f93f
wipo
TomAugspurger Aug 6, 2018
eb09d21
concat is broken
TomAugspurger Aug 7, 2018
7dcf4b2
formatting failing
TomAugspurger Aug 7, 2018
b39658a
more wip
TomAugspurger Aug 7, 2018
a8b76bd
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 8, 2018
e041313
Extension test fixups
TomAugspurger Aug 8, 2018
595535e
some indexing, sparse string
TomAugspurger Aug 9, 2018
7700299
passing indexing
TomAugspurger Aug 9, 2018
f1ff7da
passing pivot
TomAugspurger Aug 9, 2018
33fa6f7
broken broken broken
TomAugspurger Aug 10, 2018
40c035e
sanitize
TomAugspurger Aug 10, 2018
1d49cc7
broken broken broken
TomAugspurger Aug 10, 2018
6f4b6b6
wip
TomAugspurger Aug 13, 2018
6f037b5
working through series
TomAugspurger Aug 13, 2018
7da220e
working through series
TomAugspurger Aug 13, 2018
bfbe4ab
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 13, 2018
c5666b6
series passing
TomAugspurger Aug 13, 2018
ff6037c
more tests
TomAugspurger Aug 13, 2018
5c362ef
wip
TomAugspurger Aug 13, 2018
55cac36
wip
TomAugspurger Aug 13, 2018
c4e8784
More test
TomAugspurger Aug 13, 2018
a00f987
skip internals tests
TomAugspurger Aug 13, 2018
a6d7eac
linting
TomAugspurger Aug 13, 2018
4b4f9bd
cleanup
TomAugspurger Aug 13, 2018
82801be
cleanup
TomAugspurger Aug 13, 2018
1a149dc
cleanup
TomAugspurger Aug 13, 2018
fde19d7
remove debug code
TomAugspurger Aug 13, 2018
a7ba8f6
API: dispatch to EA.astype
TomAugspurger Aug 13, 2018
5064217
API: ExtensionDtype._is_numeric
TomAugspurger Aug 14, 2018
e31e8aa
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 14, 2018
79c8e9c
update type
TomAugspurger Aug 14, 2018
26993fe
Merge remote-tracking branch 'upstream/master' into ea-astype-dispatch
TomAugspurger Aug 14, 2018
6eeec11
py2 compat
TomAugspurger Aug 14, 2018
50de326
fixed test
TomAugspurger Aug 14, 2018
5ef1747
test fill value
TomAugspurger Aug 14, 2018
f31970c
Test nbytes
TomAugspurger Aug 14, 2018
f1b860f
explainers
TomAugspurger Aug 14, 2018
5c44275
linting
TomAugspurger Aug 14, 2018
33bc8f8
Allow concatenating with different sparse dtypes
TomAugspurger Aug 14, 2018
9bf13ad
Linting
TomAugspurger Aug 14, 2018
de1fb5b
lint
TomAugspurger Aug 14, 2018
da580cd
Wip
TomAugspurger Aug 14, 2018
88b73c3
Merge branch 'ea-astype-dispatch' into ea-sparse-2
TomAugspurger Aug 14, 2018
afde64d
Merge branch 'ea-is-numeric' into ea-sparse-2
TomAugspurger Aug 14, 2018
e603d3d
fixup 33bc8f836
TomAugspurger Aug 15, 2018
ec5eb9a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 15, 2018
a72ee1a
Fixed DataFrame.__setitem__ for updating to sparse.
TomAugspurger Aug 15, 2018
f147635
try removing
TomAugspurger Aug 15, 2018
c35c7c2
Merge branch 'ea-astype-dispatch' into ea-sparse-2
TomAugspurger Aug 15, 2018
e159ef2
wip
TomAugspurger Aug 16, 2018
d48a8fa
Fixup
TomAugspurger Aug 16, 2018
3bcf57e
astype works
TomAugspurger Aug 16, 2018
31d401f
Squashed commit of the following:
TomAugspurger Aug 16, 2018
a4369c2
Squashed commit of the following:
TomAugspurger Aug 16, 2018
608b499
Fixed Series[sparse].to_sparse
TomAugspurger Aug 16, 2018
14e60c9
Shift works
TomAugspurger Aug 16, 2018
550f163
parametrize shift test
TomAugspurger Aug 16, 2018
821cc91
Removed bogus test
TomAugspurger Aug 16, 2018
e21ed21
Un-xfail more
TomAugspurger Aug 16, 2018
aeb8c8c
scalar take raises
TomAugspurger Aug 16, 2018
34c90ed
Move fill_value to dtyep
TomAugspurger Aug 17, 2018
2103959
Move fill_value to dtyep
TomAugspurger Aug 17, 2018
26af959
Merge branch 'ea-sparse-dtype-fill-value' into ea-sparse-2
TomAugspurger Aug 18, 2018
e5920c2
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 18, 2018
084a967
cleanup
TomAugspurger Aug 18, 2018
bb17760
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
dde7852
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
f1b4e6b
Setting fill value (but that's bad)
TomAugspurger Aug 20, 2018
6a31077
Explicit fill value
TomAugspurger Aug 20, 2018
02aa7f7
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 20, 2018
3a7ee2d
Fixed merge conflicts
TomAugspurger Aug 20, 2018
d6fe191
subdtype -> subtype
TomAugspurger Aug 20, 2018
b1ea874
subdtype -> subtype
TomAugspurger Aug 20, 2018
2213b83
Fixed pickle
TomAugspurger Aug 21, 2018
94664c4
test dtype
TomAugspurger Aug 21, 2018
e54160c
astype update
TomAugspurger Aug 21, 2018
04a2dbb
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 21, 2018
fb01d1a
more
TomAugspurger Aug 21, 2018
f78ae81
lint
TomAugspurger Aug 21, 2018
11d5b40
py2 compat
TomAugspurger Aug 21, 2018
ba70753
dtype tests
TomAugspurger Aug 21, 2018
82bab3c
explainer
TomAugspurger Aug 21, 2018
2990124
Delete things
TomAugspurger Aug 21, 2018
a9d0f17
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 22, 2018
0c52c37
NumPy 1.9 compat
TomAugspurger Aug 22, 2018
998f113
implement divmod
TomAugspurger Aug 22, 2018
38b0356
Fix broken fill value setting
TomAugspurger Aug 22, 2018
7206d94
compare with lists
TomAugspurger Aug 22, 2018
fe771b5
clean
TomAugspurger Aug 22, 2018
12e424c
fixed index ctor fail
TomAugspurger Aug 22, 2018
3bd567f
New xfail
TomAugspurger Aug 22, 2018
f816346
Handle sparse reindex
TomAugspurger Aug 22, 2018
1a1dcf4
concat mixed
TomAugspurger Aug 22, 2018
e3d9173
take note
TomAugspurger Aug 22, 2018
2715cdb
Remove test.
TomAugspurger Aug 22, 2018
4e40599
concat NA and empty
TomAugspurger Aug 22, 2018
0aa3934
dum
TomAugspurger Aug 22, 2018
a3becb6
Fix lost fill value
TomAugspurger Aug 22, 2018
5660b9a
override
TomAugspurger Aug 22, 2018
dd3cba5
Handle fill in unique
TomAugspurger Aug 23, 2018
cc65b8a
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 23, 2018
06dce5f
Faster isna
TomAugspurger Aug 23, 2018
f7351d3
Support old numpy
TomAugspurger Aug 23, 2018
2055494
clean
TomAugspurger Aug 23, 2018
f310322
Simplified setter
TomAugspurger Aug 23, 2018
0008164
Inplace not supported.
TomAugspurger Aug 23, 2018
027f6d8
compat
TomAugspurger Aug 24, 2018
c0d9875
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 24, 2018
44b218c
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 28, 2018
47fa73a
32-bit compat
TomAugspurger Aug 28, 2018
c2c489f
Lint
TomAugspurger Aug 28, 2018
3729927
Test fixups
TomAugspurger Aug 28, 2018
9ba49e1
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 29, 2018
543ac7c
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Aug 30, 2018
f66ef6f
CI passing
TomAugspurger Aug 30, 2018
ba8fc9d
Right numpy version
TomAugspurger Aug 30, 2018
9185e33
linting
TomAugspurger Aug 30, 2018
11799ab
Try intp
TomAugspurger Aug 31, 2018
73e7626
32-bit compat
TomAugspurger Aug 31, 2018
ebece16
Doc cleanup
TomAugspurger Aug 31, 2018
7db6990
Simplify is_sparse
TomAugspurger Aug 31, 2018
be21f42
Updated factorize
TomAugspurger Sep 4, 2018
e857363
Use ABC
TomAugspurger Sep 4, 2018
d0ee038
simplify interleave_dtype
TomAugspurger Sep 4, 2018
54f4417
docstring, simplify
TomAugspurger Sep 4, 2018
2082d86
fixup supers
TomAugspurger Sep 4, 2018
f846606
Linting
TomAugspurger Sep 4, 2018
ce8e0ac
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 4, 2018
1f6590e
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 5, 2018
b758469
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 6, 2018
f6b0924
move and fix conflict
TomAugspurger Sep 6, 2018
232518c
doc note
TomAugspurger Sep 6, 2018
e8b37da
ENH: is_homogenous
TomAugspurger Sep 20, 2018
0197e0c
BUG: Preserve dtype on homogeneous EA xs
TomAugspurger Sep 20, 2018
62326ae
asarray test
TomAugspurger Sep 20, 2018
f008c38
Fixed asarray
TomAugspurger Sep 20, 2018
88c6126
Merge remote-tracking branch 'upstream/master' into ea-xs
TomAugspurger Sep 20, 2018
5c8662e
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 20, 2018
78798cf
is_homogeneous -> is_homogeneous_type
TomAugspurger Sep 20, 2018
b051424
lint
TomAugspurger Sep 20, 2018
78979b6
Squashed commit of the following:
TomAugspurger Sep 20, 2018
2333db1
Merge followup
TomAugspurger Sep 20, 2018
b41d473
Followup from merge
TomAugspurger Sep 20, 2018
d6a2479
lint
TomAugspurger Sep 20, 2018
a23c27c
Merge remote-tracking branch 'origin/ea-xs' into ea-sparse-2
TomAugspurger Sep 20, 2018
7372eb3
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
cab8c54
handle unary ops
TomAugspurger Sep 26, 2018
52ae275
linting
TomAugspurger Sep 26, 2018
9c9b49e
compat, lint
TomAugspurger Sep 26, 2018
f5d7492
SparseSeries unary ops
TomAugspurger Sep 26, 2018
b4b4cbc
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
bf98b9d
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 26, 2018
f3d2681
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Sep 29, 2018
7d4d3ba
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 4, 2018
57c03c2
splib
TomAugspurger Oct 4, 2018
0dbc33e
collections -> compat
TomAugspurger Oct 4, 2018
c217cf5
updates
TomAugspurger Oct 8, 2018
2ea7a91
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 8, 2018
8f2f228
Set dtype
TomAugspurger Oct 8, 2018
c83bed7
reveret
TomAugspurger Oct 8, 2018
53e494e
clarify fillna
TomAugspurger Oct 8, 2018
627b9ce
Remove old invert
TomAugspurger Oct 8, 2018
df0293a
some cleanup
TomAugspurger Oct 8, 2018
a590418
remove redundant whatsnew
TomAugspurger Oct 9, 2018
7821f19
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 9, 2018
ee26c52
Update hashing, eq
TomAugspurger Oct 9, 2018
40390f1
wip-comments
TomAugspurger Oct 11, 2018
15a164d
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 11, 2018
88432c8
hashing
TomAugspurger Oct 11, 2018
3e7ec90
dtype and datetime64
TomAugspurger Oct 11, 2018
7b0a179
Updates
TomAugspurger Oct 11, 2018
20d8815
index
TomAugspurger Oct 11, 2018
3e81c69
wip
TomAugspurger Oct 11, 2018
1098a7a
quantile test
TomAugspurger Oct 11, 2018
10d204a
merge conflict
TomAugspurger Oct 11, 2018
69075d8
use is_homogenous_type
TomAugspurger Oct 11, 2018
0764baa
use assert_frame_equal
TomAugspurger Oct 11, 2018
a4a47c5
merge exp construction
TomAugspurger Oct 11, 2018
a5b6c39
API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
70d8268
document and test map
TomAugspurger Oct 11, 2018
7aed79f
table formatting
TomAugspurger Oct 11, 2018
11e55aa
fixup! API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
11606af
Restore subclass test
TomAugspurger Oct 11, 2018
2f73179
Revert changes to test
TomAugspurger Oct 11, 2018
1b3058a
quote
TomAugspurger Oct 11, 2018
f4ec928
fixup! API: Allow ExtensionArray.isna to be an EA
TomAugspurger Oct 11, 2018
8c67ca2
lint
TomAugspurger Oct 11, 2018
cc89ec7
COMPAT: NumPy 1.9 bool-like indexing
TomAugspurger Oct 12, 2018
3f713d4
misc. comments
TomAugspurger Oct 12, 2018
886fe03
Merge remote-tracking branch 'upstream/master' into ea-sparse-2
TomAugspurger Oct 12, 2018
75099af
asarray on bool key for numpy compat
TomAugspurger Oct 12, 2018
731fc06
Raise for non-default values
TomAugspurger Oct 12, 2018
f91141d
groupby / reduce compat
TomAugspurger Oct 12, 2018
37a4b57
lint
TomAugspurger Oct 12, 2018
4aad8e1
fix docs
jorisvandenbossche Oct 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
broken broken broken
  • Loading branch information
TomAugspurger committed Aug 10, 2018
commit 33fa6f762d205d2dc023d52bb794be23ab90b66b
4 changes: 3 additions & 1 deletion pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,9 @@ def _get_frame_result_type(result, objs):
otherwise, return 1st obj
"""

if result.blocks and all(is_sparse(b) for b in result.blocks):
if (result.blocks and (
all(is_sparse(b) for b in result.blocks) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to my comment above. cannot is_sparse not simply check if its an EA and if it has a Sparse Dtype?

then you simply need to pass the b.values here, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give that a shot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here, its not obvious what you are doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can obj be a SparseFrame here? is this tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment of mine may have been lost.

This is hit in several places (e.g. pandas/tests/sparse/test_combine_concat.py::TestSparseDataFrameConcat::test_concat).

What part can I clarify here?

all(isinstance(obj, ABCSparseDataFrame) for obj in objs))):
from pandas.core.sparse.api import SparseDataFrame
return SparseDataFrame
else:
Expand Down
13 changes: 13 additions & 0 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -626,6 +626,16 @@ def _consolidate_check(self):
self._is_consolidated = len(ftypes) == len(set(ftypes))
self._known_consolidated = True

@property
def is_homogenous(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to support slicing a row from a dataframe with all the same EA blocks. Previously this would be object.

This implicitly relies on hash(extension_dtype) being correct (equality).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this no _is_homogeneous_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call it NDFrame._is_homogenous_type on master. Updating.

"""
Like is_mixed_type, but handles NonConsolidatable blocks
"""
if self.any_extension_types:
return len(set(block.dtype for block in self.blocks)) == 1
else:
return self.is_mixed_type

@property
def is_mixed_type(self):
# Warning, consolidation needs to get checked upstairs
Expand Down Expand Up @@ -1593,6 +1603,9 @@ def _can_hold_na(self):
def is_consolidated(self):
return True

def is_homogenous(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, shouldnt this be _is_mixed_type: False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NDFrame._is_homogenous

return True

def _consolidate_check(self):
pass

Expand Down
10 changes: 6 additions & 4 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1918,16 +1918,18 @@ def _cast_sparse_series_op(left, right, opname):
left : SparseArray
right : SparseArray
"""
from pandas.core.sparse.api import SparseDtype

opname = opname.strip('_')

if is_integer_dtype(left) and is_integer_dtype(right):
# series coerces to float64 if result should have NaN/inf
if opname in ('floordiv', 'mod') and (right.values == 0).any():
left = left.astype(np.float64)
right = right.astype(np.float64)
left = left.astype(SparseDtype(np.float64))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes like this in the tests. Previously SparseArray.dtype was an np.dtype. Now it's our SparseDtype class. We could define SparseDtype.__eq__ to compare equal with the right numpy dtypes, but that's probably a bad idea. Maybe for backwards compat...

right = right.astype(SparseDtype(np.float64))
elif opname in ('rfloordiv', 'rmod') and (left.values == 0).any():
left = left.astype(np.float64)
right = right.astype(np.float64)
left = left.astype(SparseDtype(np.float64))
right = right.astype(SparseDtype(np.float64))

return left, right

Expand Down
12 changes: 10 additions & 2 deletions pandas/core/reshape/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,6 @@ def stack(frame, level=-1, dropna=True):
-------
stacked : Series
"""

def factorize(index):
if index.is_unique:
return index, np.arange(len(index))
Expand Down Expand Up @@ -461,7 +460,16 @@ def factorize(index):
names=[frame.index.name, frame.columns.name],
verify_integrity=False)

new_values = frame.values.ravel()
# For homogonoues EAs, self.values will coerce to object. So
# we concatenate instead.
if frame._data.any_extension_types and frame._data.is_homogenous:
# TODO: this needs to be unit tested.
arr = frame._data.blocks[0].dtype.construct_array_type()
new_values = arr._concat_same_type([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be handled more generally as it seems quite a hack here, maybe push this into internals (not necessary to do here, just maybe make a note ./ TODO for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that be useful in places other than stack? It seems like this kind of .values.reshape is only useful there, but I'm not very familiar with this section of code.

If there are other places, I'll make an NDFrame._safe_stack or something like this that does a concatenate homogenous EAs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is that all ops like this are handled in internals, rather than here. it just makes the code much more distributed for these special cases. rather have the special cases grouped together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I see a few things in mangers.py like _stack_arrays. You think something another little helper like that in managers.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on making fewer parts of the code internals-aware

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, the line above can also be rewritten to not reach into the internals / block manager, as the dtype can also be accessed from the first column (since all are the same)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is resolved now.

    if frame._is_homogeneous_type and is_extension_array_dtype(dtype):
        arr = dtype.construct_array_type()
        new_values = arr._concat_same_type([
            col for _, col in frame.iteritems()
        ])

blk.values for blk in frame._data.blocks
])
else:
new_values = frame.values.ravel()
if dropna:
mask = notna(new_values)
new_values = new_values[mask]
Expand Down
126 changes: 100 additions & 26 deletions pandas/core/sparse/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from __future__ import division
# pylint: disable=E1101,E1103,W0231

import operator
import numpy as np
import warnings

Expand Down Expand Up @@ -66,6 +67,7 @@ def _get_fill(arr):


def _sparse_array_op(left, right, op, name):
# type: (SparseArray, SparseArray, Callable, str) -> Any
if name.startswith('__'):
# For lookups in _libs.sparse we need non-dunder op name
name = name[2:-2]
Expand All @@ -75,9 +77,10 @@ def _sparse_array_op(left, right, op, name):
rtype = right.dtype.subdtype

if not is_dtype_equal(ltype, rtype):
dtype = find_common_type([ltype, rtype])
dtype = SparseDtype(find_common_type([ltype, rtype]))
left = left.astype(dtype)
right = right.astype(dtype)
dtype = dtype.subdtype
else:
dtype = ltype

Expand Down Expand Up @@ -135,10 +138,14 @@ def _wrap_result(name, data, sparse_index, fill_value, dtype=None):
if name in ('eq', 'ne', 'lt', 'gt', 'le', 'ge'):
dtype = np.bool

if not is_scalar(fill_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib.item_from_zerodim

fill_value = fill_value.item()

if is_bool_dtype(dtype):
# fill_value may be np.bool_
fill_value = bool(fill_value)
return SparseArray(data, sparse_index=sparse_index, fill_value=fill_value)
return SparseArray(data, sparse_index=sparse_index, fill_value=fill_value,
dtype=dtype)


class SparseArray(PandasObject, ExtensionArray, ExtensionOpsMixin):
Expand Down Expand Up @@ -212,6 +219,7 @@ def __init__(self, data, sparse_index=None, index=None, fill_value=None,
if not is_array_like(data):
try:
# ajelijfalsejdataj0
# probably shared code in sanitize_series
data2 = np.atleast_1d(np.asarray(data, dtype=dtype))
if is_string_dtype(data2) and dtype is None:
# work around NumPy's coercion of non-strings to strings
Expand Down Expand Up @@ -511,7 +519,7 @@ def take(self, indices, allow_fill=False, fill_value=None):

def _take_with_fill(self, indices, fill_value=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason that the take code here is that much expanded compared to before?
(I didn't yet look into detail in this new implementation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think take was the first thing I wrote back in June or July, so my memory is fuzzy ;) If I had to guess, I would say additional edge-case handling, but I haven't looked closely either.

if fill_value is None:
fill_value = self.fill_value
fill_value = self.dtype.na_value

if indices.min() < -1:
raise ValueError("Invalid value in 'indices'. Must be between -1 and the length of the array.")
Expand All @@ -532,24 +540,36 @@ def _take_with_fill(self, indices, fill_value=None):

if self.sp_index.npoints == 0:
# Avoid taking from the empty self.sp_values
taken = np.full(sp_indexer.shape, fill_value=self.fill_value)
taken = np.full(sp_indexer.shape, fill_value=fill_value)
else:
taken = self.sp_values.take(sp_indexer)
# Have to fill in two steps, since the user-passed fill value may be
# different from self.fill_value.

m1 = sp_indexer < 0
m2 = indices < 0
# sp_indexer may be -1 for two reasons
# 1.) we took for an index of -1 (new)
# 2.) we took a value that was self.fill_value (old)
new_fill_indices = indices == -1
old_fill_indices = (sp_indexer == -1) & ~new_fill_indices

result_type = np.result_type(taken, self.fill_value)
# Fill in two steps.
# Old fill values
# New fill values
# potentially coercing to a new dtype at each stage.

if m1.any():
m0 = sp_indexer[old_fill_indices] < 0
m1 = sp_indexer[new_fill_indices] < 0

result_type = taken.dtype

if m0.any():
result_type = np.result_type(result_type, self.fill_value)
taken = taken.astype(result_type)
taken[m1] = self.fill_value
taken[old_fill_indices] = self.fill_value

if m2.any():
if m1.any():
result_type = np.result_type(result_type, fill_value)
taken = taken.astype(result_type)
taken[indices < 0] = fill_value
taken[new_fill_indices] = fill_value

return taken

def _take_without_fill(self, indices):
Expand Down Expand Up @@ -608,21 +628,50 @@ def _concat_same_type(cls, to_concat):
fill_value = list(fill_value)[0]

values = []
indices = []
length = 0

for arr in to_concat:
# TODO: avoid to_int_index? Is that expensive?
idx = arr.sp_index.to_int_index().indices.copy()
idx += length # TODO: wraparound
length += arr.sp_index.length
if to_concat:
sp_kind = to_concat[0].kind
else:
sp_kind = 'integer'

if sp_kind == 'integer':
indices = []

values.append(arr.sp_values)
indices.append(idx)
for arr in to_concat:
idx = arr.sp_index.to_int_index().indices.copy()
idx += length # TODO: wraparound
length += arr.sp_index.length

data = np.concatenate(values)
indices = np.concatenate(indices)
sp_index = IntIndex(length, indices)
values.append(arr.sp_values)
indices.append(idx)

data = np.concatenate(values)
indices = np.concatenate(indices)
sp_index = IntIndex(length, indices)

else:
# when concatentating block indices, we don't claim that you'll
# get an identical index as concating the values and then
# creating a new index. We don't want to spend the time trying
# to merge blocks across arrays in `to_concat`, so the resulting
# BlockIndex may have more blocs.
blengths = []
blocs = []

for arr in to_concat:
idx = arr.sp_index.to_block_index()

values.append(arr.sp_values)
blocs.append(idx.blocs.copy() + length)
blengths.append(idx.blengths)
length += arr.sp_index.length

data = np.concatenate(values)
blocs = np.concatenate(blocs)
blengths = np.concatenate(blengths)

sp_index = BlockIndex(length, blocs, blengths)

return cls(data, sparse_index=sp_index, fill_value=fill_value)

Expand Down Expand Up @@ -800,6 +849,15 @@ def mean(self, axis=0, *args, **kwargs):
nsparse = self.sp_index.ngaps
return (sp_sum + self.fill_value * nsparse) / (ct + nsparse)

def transpose(self, *axes):
"""Returns the SparseArray."""
return self

@property
def T(self):
"""Returns the SparseArray."""
return self

# ------------------------------------------------------------------------
# Ufuncs
# ------------------------------------------------------------------------
Expand All @@ -812,13 +870,14 @@ def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
new_fill_values = []

special = {'add', 'sub', 'mul', 'pow', 'mod', 'floordiv', 'truediv',
'divmod', 'eq', 'ne', 'lt', 'gt', 'le', 'ge'}
'divmod', 'eq', 'ne', 'lt', 'gt', 'le', 'ge', 'remainder'}
aliases = {
'subtract': 'sub',
'multiply': 'mul',
'floor_divide': 'floordiv',
'true_divide': 'truediv',
'power': 'pow',
'remainder': 'mod',
}
op_name = ufunc.__name__
op_name = aliases.get(op_name, op_name)
Expand Down Expand Up @@ -892,17 +951,30 @@ def _create_comparison_method(cls, op):
def cmp_method(self, other):
op_name = op.__name__

if op_name in {'and_', 'or_'}:
op_name = op_name[:-1]

if isinstance(other, (ABCSeries, ABCIndexClass)):
other = getattr(other, 'values', other)

if isinstance(other, np.ndarray):
# TODO: make this more flexible than just ndarray...
if len(self) != len(other):
raise AssertionError("length mismatch: {self} vs. {other}"
.format(self=len(self), other=len(other)))
other = SparseArray(other, fill_value=self.fill_value)

if isinstance(other, SparseArray):
return _sparse_array_op(self, other, op, op_name)
else:
with np.errstate(all='ignore'):
fill_value = op(self.fill_value, other)
result = op(self.sp_values, other)

return type(self)(result, sparse_index=self.sp_index, fill_value=fill_value)
return type(self)(result,
sparse_index=self.sp_index,
fill_value=fill_value,
dtype=np.bool_)

name = '__{name}__'.format(name=op.__name__)
return compat.set_function_name(cmp_method, name, cls)
Expand All @@ -918,6 +990,8 @@ def __unicode__(self):

SparseArray._add_arithmetic_ops()
SparseArray._add_comparison_ops()
SparseArray.__and__ = SparseArray._create_comparison_method(operator.and_)
SparseArray.__or__ = SparseArray._create_comparison_method(operator.or_)


# class SparseArray(PandasObject, np.ndarray, ExtensionArray):
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/sparse/dtype.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ def __hash__(self):
# XXX: this needs to be part of the interface.
return hash(str(self))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? this is not unique across sparse objects, does it need to be? isn't this actually rendering the object?

Copy link
Contributor Author

@TomAugspurger TomAugspurger Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is bad... Fixing. This is complicated a bit by NaNs being in the metadata.


def __eq__(self, other):
# TODO: test
if isinstance(other, type(self)):
return self.type == other.type
else:
return super(SparseDtype, self).__eq__(other)

@property
def _is_numeric(self):
from pandas.core.dtypes.common import is_object_dtype
Expand Down
6 changes: 6 additions & 0 deletions pandas/core/sparse/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from pandas.compat.numpy import function as nv
from pandas.core.index import Index, ensure_index, InvalidIndexError
from pandas.core.series import Series
from pandas.core.dtypes.generic import ABCSeries, ABCSparseSeries
from pandas.core.internals import SingleBlockManager
from pandas.core import generic
import pandas.core.common as com
Expand Down Expand Up @@ -66,8 +67,13 @@ def __init__(self, data=None, index=None, sparse_index=None, kind='block',
fill_value=None, name=None, dtype=None, copy=False,
fastpath=False):
if isinstance(data, SingleBlockManager):
# TODO: share validation with Series
index = data.index
data = data.blocks[0].values
elif isinstance(data, (ABCSeries, ABCSparseSeries)):
index = data.index if index is None else index
dtype = data.dtype if dtype is None else dtype
name = data.name if name is None else name

super(SparseSeries, self).__init__(
SparseArray(data,
Expand Down
Loading