Skip to content

fix overloads for read_csv and read_table #109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 14, 2021
Merged

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Oct 13, 2021

Closes #87 based on suggestions there

Copy link
Member

@jakebailey jakebailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this; I had tried to fix this but similarly scratched my head.

There are a few errors that show when I'm backporting these, noted in the review. We really should get some sort of CI running here.

@@ -147,7 +250,7 @@ def read_table(
date_parser: Optional[Callable] = ...,
dayfirst: bool = ...,
cache_dates: bool = ...,
iterator: bool = ...,
iterator: Literal[True],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

Comment on lines +306 to +307
iterator: Literal[False],
chunksize: int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And lastly these two.

Comment on lines +97 to +98
iterator: Literal[False],
chunksize: int,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 13, 2021

@jakebailey - what are you doing to see those error messages?

@jakebailey
Copy link
Member

I'm just plopping them into our internal tree (i.e. what is copied and turns into Pylance's bundled/stub directory) and the diagnostics show up in VS Code with Pylance enabled; these rules aren't type checking rules, but language rules (the code should crash at runtime when the signature is processed), so there shouldn't be anything needed to see them.

@jakebailey
Copy link
Member

Let me test and see if they appear if I just open this repo for editing.

@jakebailey
Copy link
Member

jakebailey commented Oct 13, 2021

Yeah, it reproduces when you open this repo, though you'll have to disable the "strict"; the pandas stubs can't pass to that standard (let alone basic, at the moment).

@jakebailey
Copy link
Member

FWIW I can add the requisite = ..., I just know that you'd probably want to test those changes before they go out.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 13, 2021

Well, adding the = ... doesn't work. Going back to my simple example that is similar to what is in read_csv(), I added arguments before and after i1 and cs which are the ones we want to "trap" with @overload:

@overload
def myfun(
    fake: str,
    first: float = ...,
    i1: Literal[True] = ...,
    cs: Optional[int] = ...,
    ext: str = ...,
) -> str:
    ...


@overload
def myfun(
    fake: str,
    first: float = ...,
    i1: Literal[False] = ...,
    cs: int = ...,
    ext: str = ...,
) -> str:
    ...


@overload
def myfun(
    fake: str,
    first: float = ...,
    i1: Literal[False] = ...,
    cs: None = ...,
    ext: str = ...,
) -> int:
    ...


@overload
def myfun(
    fake: str, first: float = ..., i1: bool = ..., cs: int = ..., ext: str = ...
) -> str:
    ...


def myfun(
    fake: str,
    first: float = -33.0,
    i1: bool = False,
    cs: Optional[int] = None,
    ext: str = "",
) -> Union[str, int]:
    print(f"i1 is {i1} cs is {cs} result is ", end="")
    if i1:
        if cs is not None:
            return "TextFileReader"
        else:
            return "TextFileReader"
    else:
        if cs is not None:
            return "TextFileReader"
        else:
            return -1




res1: int = myfun("meh")
res2: int = myfun("meh", i1=False)
res3: int = myfun("meh", i1=False, cs=None)
res4: str = myfun("meh", i1=False, cs=23)
res5: str = myfun("meh", i1=True)
res6: str = myfun("meh", i1=True, cs=None)
res7: str = myfun("meh", i1=True, cs=23)
res8: int = myfun("meh", cs=None)
res9: str = myfun("meh", cs=23)

Just by adding the first and ext arguments to the main signature and the 4 overloads suggested by @erictraut , now we get that the overlap signatures 1 and 3, 2 and 3, and 3 and 4 all overlap. Then the assignments for res1, res2 and res8 all come up with typing errors.

Looking for suggestions on how to handle that. It seems that if the two arguments that control the return type are arguments with default values, and they have other arguments before and after the those 2 arguments, that's where you run into difficulty.

@erictraut
Copy link
Contributor

erictraut commented Oct 13, 2021

My assumption is that most people don't use read_csv with 50+ positional arguments; they typically specify the first few arguments positionally and then use keyword arguments for the remainder. If my assumption is correct, then we should model it like that in the overload list. It's important for the overloads that support keyword arguments to return the correct (specific) return type. A final catch-all overload can handle the positional argument case, perhaps with a return type of Any.

So my suggestion is to add the * parameter as I suggested previously but then add one more overload that includes all arguments positionally (i.e. no * parameter) that acts as a fallback.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 13, 2021

My assumption is that most people don't use read_csv with 50+ positional arguments; they typically specify the first few arguments positionally and then use keyword arguments for the remainder. If my assumption is correct, then we should model it like that in the overload list. It's important for the overloads that support keyword arguments to return the correct (specific) return type. A final catch-all overload can handle the positional argument case, perhaps with a return type of Any.

Actually most people use read_csv with just one positional argument (the first one - name of the file or buffer, which is required and has no default value), and then all other arguments are specified by keyword, and NOT positionally.

So my suggestion is to add the "" parameter as I suggested previously but then add one more overload that includes all arguments positionally (i.e. no "" parameter) that acts as a fallback.

Little confused here - above you said add *, now you say add _ ??

Having said that, since I think your assumption is wrong, does that change things?

Or are you saying use 4 overloads, all with a *, and then a final overload without?

@erictraut
Copy link
Contributor

Sorry, the markdown processor ate the * in my response. I edited my previous response for clarity.

My assumption is in line with what you said. Place the * after the first parameter for the first 4 overloads. Then add a fifth overload that can act as a "catch all" in case someone actually passes many positional arguments.

@jakebailey
Copy link
Member

Sorry to misinterpret the original * comment; I really thought adding = ... would work but clearly not!

@jakebailey jakebailey mentioned this pull request Oct 13, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 14, 2021

Just did a commit 28473c3 that I think works. Thanks for the suggestions @jakebailey and @erictraut . Tested with the following:

from io import StringIO
import pandas as pd
from pandas.io.parsers import TextFileReader

dio = StringIO("a,b\n 1,2\n 3,4\n")

df: pd.DataFrame = pd.read_csv(dio)

print(df)

dio.seek(0)
tr: pd.DataFrame = pd.read_csv(dio)

dio.seek(0)
tr2: TextFileReader = pd.read_csv(dio, iterator=True)

dio.seek(0)
tr3: pd.DataFrame = pd.read_csv(dio, iterator=False)

dio.seek(0)
tr7: pd.DataFrame = pd.read_csv(dio, header="infer")

dio.seek(0)
tr8: TextFileReader = pd.read_csv(dio, header="infer", iterator=True)


ftr1: TextFileReader = pd.read_csv("hey.csv")
ftr2: pd.DataFrame = pd.read_csv("hey.csv", iterator=True)
ftr3: TextFileReader = pd.read_csv("hey.csv", iterator=False)
ftr5: pd.DataFrame = pd.read_csv("hey.csv", chunksize=10)
ftr6: TextFileReader = pd.read_csv("he.csv", chunksize=None)
ftr7: TextFileReader = pd.read_csv("hey.csv", header="infer")
ftr8: pd.DataFrame = pd.read_csv("hey.csv", header="infer", iterator=True)

The last 7 examples all report incompatible assignments, as expected.

@jakebailey
Copy link
Member

Thanks, I'll go import it and see. We (luckily) delayed our weekly release, so I should be able to sneak this one in.

Copy link
Member

@jakebailey jakebailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong return type of read_csv if using IO-Objects
3 participants