Skip to content

Conversation

@metllord
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Jul 15, 2017

Hello @metllord! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 17, 2017 at 14:18 Hours UTC

@gfyoung gfyoung added Bug Sparse Sparse Data Type labels Jul 15, 2017
Sparse
^^^^^^

- Bug in SparseSeries raises AttributeError when a dicitonary is passed as data (:issue:`16777`)
Copy link
Member

@gfyoung gfyoung Jul 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dicitonary --> dictionary
"passed as" --> "passed in as"

self.ziseries2 = SparseSeries(arr, index=index, kind='integer',
fill_value=0)
def test_constructor_data_input(self):
arr = SparseSeries({1: 1})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reference the issue right above this line (e.g. "# see gh-16905)"?

self.ziseries2 = SparseSeries(arr, index=index, kind='integer',
fill_value=0)
def test_constructor_data_input(self):
arr = SparseSeries({1: 1})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice how you use {1: 1} multiple times. Why don't we refactor that as a variable and then use that variable instead throughout the tests. The same goes for your index=[0, 1, 2]. Set that as a variable.

assert arr.count() == len({1: 1})

arr = SparseSeries(pd.Series({1: 1}, index=[0, 1, 2]))
assert arr.count() == pd.Series({1: 1}, index=[0, 1, 2]).count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate out your pd.Series({1: 1}, index=[0, 1, 2]).count() as a variable like "expected" so that it's a little clearer what the expected output is.

@codecov
Copy link

codecov bot commented Jul 16, 2017

Codecov Report

Merging #16960 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16960      +/-   ##
==========================================
- Coverage   90.98%   90.96%   -0.02%     
==========================================
  Files         161      161              
  Lines       49288    49288              
==========================================
- Hits        44846    44837       -9     
- Misses       4442     4451       +9
Flag Coverage Δ
#multiple 88.74% <100%> (ø) ⬆️
#single 40.2% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/series.py 95.07% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.76% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96168ef...19da337. Read the comment docs.

@codecov
Copy link

codecov bot commented Jul 16, 2017

Codecov Report

Merging #16960 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16960      +/-   ##
==========================================
- Coverage   90.98%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49288    49292       +4     
==========================================
- Hits        44846    44844       -2     
- Misses       4442     4448       +6
Flag Coverage Δ
#multiple 88.74% <100%> (ø) ⬆️
#single 40.19% <100%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/series.py 95.06% <100%> (-0.02%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.76% <0%> (-0.1%) ⬇️
pandas/core/series.py 94.89% <0%> (ø) ⬆️
pandas/core/dtypes/cast.py 86.89% <0%> (ø) ⬆️
pandas/core/generic.py 92.29% <0%> (ø) ⬆️
pandas/core/config_init.py 96% <0%> (+0.03%) ⬆️
pandas/plotting/_core.py 82.72% <0%> (+0.2%) ⬆️
pandas/plotting/_timeseries.py 60.73% <0%> (+0.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96168ef...5416762. Read the comment docs.

# see gh-16905
test_dict = {1: 1}
test_index = [0, 1, 2]
test_series = pd.Series({1: 1}, index=test_index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the {1: 1} with test_dict.

Sparse
^^^^^^

- Bug in SparseSeries raises AttributeError when a dictionary is passed in as data (:issue:`16777`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add double ticks ("``") around both SparseSeries and `AttributeError`?

arr = SparseSeries(test_dict)
assert arr.count() == len(test_dict)

arr = SparseSeries(test_series, index=test_index))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops: got an extra right parenthesis there

@gfyoung
Copy link
Member

gfyoung commented Jul 16, 2017

@metllord : Ping when green, and we will merge this!

assert arr.count() == len(test_dict)

arr = SparseSeries(test_series, index=test_index)
assert arr.count() == test_series.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use assert_sp_series_equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to use assert_series_equal since assert_sp_series_equal requires two sparse series, even if check_series_type=False.
This appears to result from line 1523 in util.testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you should do is construct your expected result via instantiation as a Series (i.e. pass the dict into the Series constructor). Then pass the Series object into the SparseSeries constructor. That becomes your expected output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second test is testing the creation of a SparseSeries from a Series. The SparseSeries constructor takes either a dict or a Series. Aside from class, the SparseSeries should be equal to the original Series.

def test_constructor_data_input(self):
# see gh-16905
test_dict = {1: 1}
test_index = [0, 1, 2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't name things with test_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would _test_dict, etc work? I saw that higher up in the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @jreback is saying to come up with different names that don't use "test" at all.

test_index = [0, 1, 2]
test_series = pd.Series(test_dict, index=test_index)

arr = SparseSeries(test_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result =


arr = SparseSeries({1: 1})
assert arr.dtype == np.int64

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look, I don't think you need this. You can remove.


result = SparseSeries(series, index=index)
tm.assert_series_equal(result, series, check_series_type=False)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay : in light of what you just said, I think you can condense this test a bit and rename as follows:

def test_constructor_dict_input(self):
...
series = pd.Series(constructor_dict, index=index)
expected = SparseSeries(series, index=index)

result = SparseSeries(constructor_dict)
tm.assert_sp_series_equal(result, expected)

And delete everything else below it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I added this test and did find a case that was not handled correctly. I fixed the code and added tests for all options of SparseSeries creation.


elif isinstance(data, (Series, dict)):
data = Series(data, index=index)
if index is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think you might be able to remove this if index is None bizeness (and simply use data.index if needed below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see what if anything breaks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the full test suite with this change and everything is working. I also manually ran a bunch of SparseSeries with different inputs (Series, dicts, with/without indexes) and it does work as intended.

self.ziseries2 = SparseSeries(arr, index=index, kind='integer',
fill_value=0)

def test_constructor_data_input(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to test_constructor_dict_input and move your "see" comment to right below the definition.

result = SparseSeries(constructor_dict)
tm.assert_sp_series_equal(result, expected)

# Series with index and dictionary with no index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this being tested? The result variable is the same as in line 98.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing input with a series, but that is test turned out to be redundant. I removed it from the most recent commit.

@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2017

@jreback if you have anything else to add here. Otherwise, LGTM!

@jreback
Copy link
Contributor

jreback commented Jul 18, 2017

this duplicates #16906

@gfyoung gfyoung added this to the 0.21.0 milestone Jul 19, 2017
@jreback jreback merged commit 47e909d into pandas-dev:master Jul 19, 2017
@jreback
Copy link
Contributor

jreback commented Jul 19, 2017

thanks @metllord

@metllord metllord deleted the bugfix/SparseSeries branch July 19, 2017 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Sparse Sparse Data Type

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: SparseSeries from dict inconsistency

4 participants