-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: added array #23581
API: added array #23581
Conversation
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 28, 2018 at 22:13 Hours UTC |
I'd like to restructure the documentation a bit. I want to collect similar things together, but we have two kinds of similarity: First, we could group by "topic" (the type of data).
and then eventually DatetimeArray & TimedeltaArray. Alternatively, we could group by "kind" first. So we'd have
Do people have a preference between "by topic" and "by kind" (cc. @jorisvandenbossche @jreback @jbrockmendel @datapythonista)? |
I suppose you are only talking about the api.rst page? |
Yes, sorry. |
Not a strong opinion, but for adding new docs I would go by kind. |
The nice thing about "by topic" is that we can give a high-level summary of what, e.g. "Period" is for. If we have things grouped "by kind" (dtypes, scalars, arrays), then we'd need to repeat that description, or just have it once. |
But in those places you would also put all the custom attributes and methods? I actually think we can be duplicated somewhat. Even if there are topical sections, I think it is still nice to have a short table of all dtypes and one for all array types. |
Codecov Report
@@ Coverage Diff @@
## master #23581 +/- ##
==========================================
+ Coverage 92.31% 92.32% +0.01%
==========================================
Files 165 166 +1
Lines 52194 52240 +46
==========================================
+ Hits 48182 48231 +49
+ Misses 4012 4009 -3
Continue to review full report at Codecov.
|
pd.IntervalArray.from_tuples([(1, 2), (3, 4)])), | ||
([0, 1], 'Sparse[int64]', pd.SparseArray([0, 1], dtype='int64')), | ||
([1, None], 'Int16', integer_array([1, None], dtype='Int16')), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also send in pd.Series / pd.Index of these types (or is this tested below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you test pass thru to ndarray? (e.g. maybe use any_numpy_dtype fixture with the dtype specified)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is tested below. Added a couple here as well though.
Not sure how any_numpy_dtype
would be used here. We would need data to go with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger right you would need to construct a dummy array and test that its passing in thru
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure what this would look like, or what the test is for. I'm not especially worried about specific numpy types not getting through, and we do test that path.
|
||
|
||
def test_registered(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be an extension test? (or maybe in test_dtypes below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure how to write a base test for that...
looks good @TomAugspurger mostly some docs & clarification comments, generally looks good though. |
From #23581 (comment) No, we don't because I hadn't really considered what to do about that. Do people have thoughts on how to handle 2-D (or n-d) input inside |
Added a doc note about just doing 1-d arrays. Holding off on NumPy / 2D changes for now, in case we decide we like |
pd.array([1, 2, 3]) I guess you have to raise on this until we have numpy backed EA (which I don't think we should try to put into 0.24.0). Though I am ok with just returning a numpy array for now. |
932e119 has the changes for This implies that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments
doc/source/whatsnew/v0.24.0.rst
Outdated
|
||
A new top-level method :func:`array` has been added for creating 1-dimensional arrays (:issue:`22860`). | ||
This can be used to create any :ref:`extension array <extending.extension-types>`, including | ||
extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you now have a ref in basics where this should point?
@TomAugspurger I am ok with merging this on green. Can follow up from the prior @jorisvandenbossche comment, which was about how we are handling string dtypes I think? can you create an issue for that discussion |
Thanks for the review. I think that https://github.com/pandas-dev/pandas/pull/23581/files#diff-69ac57923b848af43df327c311b79db4R90 handles @jorisvandenbossche's comments regarding string aliases for dtypes. In a world where we have a pd.array(['a', 'b'], dtype=None/str) would return a StringArray. But pd.array(['a', 'b'], dtype=np.dtype(str)) would continue to return a We should maybe emphasize more that if the underlying memory layout really matters to you, then you shouldn't be using |
#23581 (comment) sounds ok to me. |
All green now. |
awesome as always @TomAugspurger |
|
||
# this returns None for not-found dtypes. | ||
if isinstance(dtype, compat.string_types): | ||
dtype = registry.find(dtype) or dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger do you remember if there was any particular reason for using this pattern instead of dtype = pandas_dtype(dtype)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall. I wonder if this predates pandas_dtype
handling extension dtypes.
Adds
pd.array
method for creating arraysTODO
Closes #22860
supersedes #23532.