fix overloads for read_csv and read_table #109

Dr-Irv · 2021-10-13T14:35:38Z

Closes #87 based on suggestions there

jakebailey

Thanks for this; I had tried to fix this but similarly scratched my head.

There are a few errors that show when I'm backporting these, noted in the review. We really should get some sort of CI running here.

pandas/io/parsers.pyi

jakebailey · 2021-10-13T17:54:31Z

pandas/io/parsers.pyi

@@ -147,7 +250,7 @@ def read_table(
    date_parser: Optional[Callable] = ...,
    dayfirst: bool = ...,
    cache_dates: bool = ...,
-    iterator: bool = ...,
+    iterator: Literal[True],


jakebailey · 2021-10-13T17:55:07Z

pandas/io/parsers.pyi

+    iterator: Literal[False],
+    chunksize: int,


And lastly these two.

jakebailey · 2021-10-13T17:55:43Z

pandas/io/parsers.pyi

+    iterator: Literal[False],
+    chunksize: int,


Similarly here.

Dr-Irv · 2021-10-13T18:06:29Z

@jakebailey - what are you doing to see those error messages?

jakebailey · 2021-10-13T18:08:43Z

I'm just plopping them into our internal tree (i.e. what is copied and turns into Pylance's bundled/stub directory) and the diagnostics show up in VS Code with Pylance enabled; these rules aren't type checking rules, but language rules (the code should crash at runtime when the signature is processed), so there shouldn't be anything needed to see them.

jakebailey · 2021-10-13T18:09:03Z

Let me test and see if they appear if I just open this repo for editing.

jakebailey · 2021-10-13T18:15:40Z

Yeah, it reproduces when you open this repo, though you'll have to disable the "strict"; the pandas stubs can't pass to that standard (let alone basic, at the moment).

jakebailey · 2021-10-13T18:23:28Z

FWIW I can add the requisite = ..., I just know that you'd probably want to test those changes before they go out.

Dr-Irv · 2021-10-13T20:55:56Z

Well, adding the = ... doesn't work. Going back to my simple example that is similar to what is in read_csv(), I added arguments before and after i1 and cs which are the ones we want to "trap" with @overload:

@overload
def myfun(
    fake: str,
    first: float = ...,
    i1: Literal[True] = ...,
    cs: Optional[int] = ...,
    ext: str = ...,
) -> str:
    ...


@overload
def myfun(
    fake: str,
    first: float = ...,
    i1: Literal[False] = ...,
    cs: int = ...,
    ext: str = ...,
) -> str:
    ...


@overload
def myfun(
    fake: str,
    first: float = ...,
    i1: Literal[False] = ...,
    cs: None = ...,
    ext: str = ...,
) -> int:
    ...


@overload
def myfun(
    fake: str, first: float = ..., i1: bool = ..., cs: int = ..., ext: str = ...
) -> str:
    ...


def myfun(
    fake: str,
    first: float = -33.0,
    i1: bool = False,
    cs: Optional[int] = None,
    ext: str = "",
) -> Union[str, int]:
    print(f"i1 is {i1} cs is {cs} result is ", end="")
    if i1:
        if cs is not None:
            return "TextFileReader"
        else:
            return "TextFileReader"
    else:
        if cs is not None:
            return "TextFileReader"
        else:
            return -1




res1: int = myfun("meh")
res2: int = myfun("meh", i1=False)
res3: int = myfun("meh", i1=False, cs=None)
res4: str = myfun("meh", i1=False, cs=23)
res5: str = myfun("meh", i1=True)
res6: str = myfun("meh", i1=True, cs=None)
res7: str = myfun("meh", i1=True, cs=23)
res8: int = myfun("meh", cs=None)
res9: str = myfun("meh", cs=23)

Just by adding the first and ext arguments to the main signature and the 4 overloads suggested by @erictraut , now we get that the overlap signatures 1 and 3, 2 and 3, and 3 and 4 all overlap. Then the assignments for res1, res2 and res8 all come up with typing errors.

Looking for suggestions on how to handle that. It seems that if the two arguments that control the return type are arguments with default values, and they have other arguments before and after the those 2 arguments, that's where you run into difficulty.

erictraut · 2021-10-13T21:27:26Z

My assumption is that most people don't use read_csv with 50+ positional arguments; they typically specify the first few arguments positionally and then use keyword arguments for the remainder. If my assumption is correct, then we should model it like that in the overload list. It's important for the overloads that support keyword arguments to return the correct (specific) return type. A final catch-all overload can handle the positional argument case, perhaps with a return type of Any.

So my suggestion is to add the * parameter as I suggested previously but then add one more overload that includes all arguments positionally (i.e. no * parameter) that acts as a fallback.

Dr-Irv · 2021-10-13T21:31:43Z

My assumption is that most people don't use read_csv with 50+ positional arguments; they typically specify the first few arguments positionally and then use keyword arguments for the remainder. If my assumption is correct, then we should model it like that in the overload list. It's important for the overloads that support keyword arguments to return the correct (specific) return type. A final catch-all overload can handle the positional argument case, perhaps with a return type of Any.

Actually most people use read_csv with just one positional argument (the first one - name of the file or buffer, which is required and has no default value), and then all other arguments are specified by keyword, and NOT positionally.

So my suggestion is to add the "" parameter as I suggested previously but then add one more overload that includes all arguments positionally (i.e. no "" parameter) that acts as a fallback.

Little confused here - above you said add *, now you say add _ ??

Having said that, since I think your assumption is wrong, does that change things?

Or are you saying use 4 overloads, all with a *, and then a final overload without?

erictraut · 2021-10-13T21:43:34Z

Sorry, the markdown processor ate the * in my response. I edited my previous response for clarity.

My assumption is in line with what you said. Place the * after the first parameter for the first 4 overloads. Then add a fifth overload that can act as a "catch all" in case someone actually passes many positional arguments.

jakebailey · 2021-10-13T22:19:47Z

Sorry to misinterpret the original * comment; I really thought adding = ... would work but clearly not!

Dr-Irv · 2021-10-14T14:06:27Z

Just did a commit 28473c3 that I think works. Thanks for the suggestions @jakebailey and @erictraut . Tested with the following:

from io import StringIO
import pandas as pd
from pandas.io.parsers import TextFileReader

dio = StringIO("a,b\n 1,2\n 3,4\n")

df: pd.DataFrame = pd.read_csv(dio)

print(df)

dio.seek(0)
tr: pd.DataFrame = pd.read_csv(dio)

dio.seek(0)
tr2: TextFileReader = pd.read_csv(dio, iterator=True)

dio.seek(0)
tr3: pd.DataFrame = pd.read_csv(dio, iterator=False)

dio.seek(0)
tr7: pd.DataFrame = pd.read_csv(dio, header="infer")

dio.seek(0)
tr8: TextFileReader = pd.read_csv(dio, header="infer", iterator=True)


ftr1: TextFileReader = pd.read_csv("hey.csv")
ftr2: pd.DataFrame = pd.read_csv("hey.csv", iterator=True)
ftr3: TextFileReader = pd.read_csv("hey.csv", iterator=False)
ftr5: pd.DataFrame = pd.read_csv("hey.csv", chunksize=10)
ftr6: TextFileReader = pd.read_csv("he.csv", chunksize=None)
ftr7: TextFileReader = pd.read_csv("hey.csv", header="infer")
ftr8: pd.DataFrame = pd.read_csv("hey.csv", header="infer", iterator=True)

The last 7 examples all report incompatible assignments, as expected.

jakebailey · 2021-10-14T17:29:12Z

Thanks, I'll go import it and see. We (luckily) delayed our weekly release, so I should be able to sneak this one in.

jakebailey

Thank you! Looks good to me.

fix overloads for read_csv and read_table

c4d8ab1

jakebailey suggested changes Oct 13, 2021

View reviewed changes

jakebailey mentioned this pull request Oct 13, 2021

Update parsers.pyi #88

Closed

add asterisks to read_csv 4 overloads, and one non-asterisk overload

28473c3

jakebailey approved these changes Oct 14, 2021

View reviewed changes

jakebailey merged commit 0d99bf5 into microsoft:main Oct 14, 2021

This was referenced Feb 1, 2022

Pandas stub fixes #129

Merged

pylance 2022.1.5 updated pandas type stub for read_csv incorrectly microsoft/pylance-release#2319

Closed

Dr-Irv deleted the readcsv branch February 18, 2022 19:58

fix overloads for read_csv and read_table #109

fix overloads for read_csv and read_table #109

Uh oh!

Conversation

Dr-Irv commented Oct 13, 2021

Uh oh!

jakebailey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakebailey Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

jakebailey Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

jakebailey Oct 13, 2021

Choose a reason for hiding this comment

Uh oh!

Dr-Irv commented Oct 13, 2021

Uh oh!

jakebailey commented Oct 13, 2021

Uh oh!

jakebailey commented Oct 13, 2021

Uh oh!

jakebailey commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakebailey commented Oct 13, 2021

Uh oh!

Dr-Irv commented Oct 13, 2021

Uh oh!

erictraut commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dr-Irv commented Oct 13, 2021

Uh oh!

erictraut commented Oct 13, 2021

Uh oh!

jakebailey commented Oct 13, 2021

Uh oh!

Dr-Irv commented Oct 14, 2021

Uh oh!

jakebailey commented Oct 14, 2021

Uh oh!

jakebailey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jakebailey commented Oct 13, 2021 •

edited

Loading

erictraut commented Oct 13, 2021 •

edited

Loading