-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding static type checking with mypy #14468
Comments
See also wesm/pandas2#18 (a advantage for targetting this for 2.0 is we can directly use the python 3 syntax) |
continuing the conversation from wesm/pandas2#18 I agree with @shoyer. type hints should be implemented as early as possible. I've taken a couple days off work, so I can get started on annotations for py2 tomorrow. Do you have any preference on how large pull requests should be? if it were up to me, given the relative safety of this sort of modification, I would probably do a PR per sub-module. Let me know if you'd like me to do it differently. After that i can move on to stubs |
@brmc sounds great! I think main things to add are support for you are wanting to add will need to add a build with are there other projects that do this (just for seeing examples), if so how are they adding it? |
A concern of mine is this: essentially you are taking a file, say Have others done this? |
We'll want to push these upstream to https://github.com/python/typeshed/ once we have reasonable coverage. |
Oh, and I hadn't heard of this, but https://github.com/python/mypy/blob/master/mypy/stubgen.py may be a good starting place? I guess it operates one module at a time so mkdir out
stubgen pandas.core.series
stubgen pandas.core.frame
stubgen pandas.core.index |
@jreback, to be honest, i've never used stubs as a core part of a project I was working on. I've only used them to supplement underlying dependencies or to workaround bugs in my IDE, so i've never thought too deeply about what consequences they might have |
ok, I'm about to get started. To be clear, I'm going to only concentrate on method signatures and return types (for now). I'm not going to annotate local variables unless I have some justifiable reason. Regarding the relevant parts of the contributing docs:
I've never contributed to pandas so please let me know if there's anything I'm overlooking |
contributing docs are: http://pandas-docs.github.io/pandas-docs-travis/contributing.html
sounds good
that's fine
yes |
right. I had already read through it. just double-checking I didn't miss anything major |
Just to update, I'm still working on this, slowly but surely. I'm pretty much restricted to weekend work, but it'll get there. it's not nearly as trivial of a task as I expected :) |
@brmc No problem, great to hear you make some progress. |
@brmc - I was going to start on mypy types. Would you mind sharing what you have? I can start from scratch but that seems a bit silly if you already spent time on it :) |
Thanks! Type checking not exactly fast, even with Do you have a preference for inline types over pyi stub files? |
I think part of the slowness is from mypy following imports? The commit I just pushed disabled that in
Inline. |
I'm already running into some interesting problems. E.g. means that we'd have to add a self.nunique() == len(self) but we don't know yet at this point that An alternative could be that we ignore the pandas bowels for now and only focus on the user interface (e.g. Overall already far less trivial than I expected :/ |
For those types of problems, I'm defining an ABCMixins, like
I think you might be right. Do you want to spend a little time exploring this? |
so would focus solely on the user focused API |
Is anyone working on this? I have been hacking it by using my own bespoke types but I'd rather use official ones |
There's a (stalled) PR at #15866. You might want to look at #15866 (comment) Please feel free to take it up! |
Maybe useful for auto-generating some annotations: |
nice write up by @shoyer for numpy we have similar but in some sense more complicated issues as we would |
I'm currently using company time to develop separate mypy stubs ( I'm not convinced that it'll be possible to capture the full pandas behaviour in mypy-annotations in a meaningful way - meaningful as in don't use With that out of the way:
Apologies for the rather long comment! |
See also python/typing#478 (comment) for more on use cases for literal values in types for pandas. I agree that this would be very helpful. For pandas and DataFrame libraries more generally, we could really use a generic version of mypy's TypedDict to indicate dtypes per column. Something like: class UserDataFrame(pandas.TypedDataFrame):
name = str
id = int
address = str For indexing, it might be easier to start with
Thanks! This is probably because my knowledge of formal type systems is pretty limited. If you have any other useful pointers, they would be appreciated! |
@shoyer that's exactly what I'd want out of a Pandas type system. Don't forget that Pandas is a data science tool, and that kind of typing would be a huge benefit to data science work. The rest is "extra" IMO. |
@gwerbin sorry, can you clarify exact what I wrote that you'd want out of a pandas type system? :) |
@shoyer the ability to do this: import pandas
from pandas.api.types import CategoricalDtype
from pathlib import Path
from py._path.local import LocalPath
from typing import Union, Text
from typing_extensions import Protocol
class SupportsRead(Protocol):
def read(self) -> Union[Text, bytes]:
...
RegionDtype = CategoricalDtype('RegionDtype', ['north', 'south', 'east', 'west'])
class MyDataFrameType(pandas.TypedDataFrame):
name = str
id = int
address = str
region = RegionDtype
PandasReadable = Union[Text, Path, LocalPath, SupportsRead]
# or better yet, having a PandasReadable class accessible in the Pandas hierarchy somewhere
def load_my_data(filepath_or_buffer: PandasReadable=None) -> MyDataFrameType:
dtypes = OrderedDict([
('name', str),
('id', int),
('address', str),
('region', RegionDtype)
])
# or ideally something more elgant like dtypes = MyDataFrameType.dtypes
return pandas.read_csv(filepath_or_buffer, usecols=list(dtypes.keys()), dtypes=dtypes) |
PEP-484 Type Annotations tools:
EDIT: https://mypy.readthedocs.io/en/latest/existing_code.html |
@teh do you have your stubs somewhere? and do you plan to continue this route?. I would interested in helping. |
Is there any way people can contribute to move this forward? I'm planning on start type checking a project and I'd definitely like having support for pandas types as well. |
At this point I suspect we'll wait until we drop Python 2 (end of year) so
that we can use python3-style type annotations.
…On Wed, Nov 28, 2018 at 11:48 AM Nicolás Andrés Gallinal < ***@***.***> wrote:
Is there any way people can contribute to move this forward? I'm planning
on start type checking a project and I'd definitely like having support for
pandas types as well.
Currently considering mypy and pyre-check
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#14468 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkFa3lXV3tVJjl6yaWamNByUoCL3ks5uzsyKgaJpZM4KdfO7>
.
|
Thanks @TomAugspurger, I'd do the same. If help can be provided for the migration I'm sure the community will be there me included. |
But in principle, some work could already be started using type comments? |
Yep (and I have been in places, probably incorrectly). We'd just need to
convert those to annotations, though mypy is probably capable of using a
mixture?
I think the main benefit is that the `typing` module will be available.
…On Wed, Nov 28, 2018 at 5:29 PM Joris Van den Bossche < ***@***.***> wrote:
But in principle, some work could already be started using type comments?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14468 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIs549UqHwcPQUzTnw0Uq8e0TsqLoks5uzxxXgaJpZM4KdfO7>
.
|
So, this is marked done, is it possible to have dataframes with specified column types now? |
@josh-theorem if you are referring to static analysis then no, we don't currently support generic parametrization of Series / Index / DataFrame objects. Not opposed to it in the long run but we have a large part of the code base that just needs to be annotated first. If you would like to help out there is #26766 which is a pre-cursor to what you are asking for. Could definitely use community support to push that along if you'd like to submit PRs |
http://blog.zulip.org/2016/10/13/static-types-in-python-oh-mypy/
might be interesting if someone is looking for a project :>
The text was updated successfully, but these errors were encountered: