Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some type annotations io/formats/format.py #27418

Merged

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins commented Jul 16, 2019

typing not added to all function signatures in io/formats/format.py in this pass to reduce the diff

this PR also contains a refactor to simplify the addition of type annotations; truncate_h and tr_col_num are dependant variables. truncate_h can be True or False. if True, tr_col_num is an int else if truncate_h is False, tr_col_num is None

same applies for truncate_v and tr_row_num

@simonjayhawkins simonjayhawkins added Refactor Internal refactoring of code Output-Formatting __repr__ of pandas objects, to_string Clean Typing type annotations, mypy/pyright type checking labels Jul 16, 2019
@@ -172,7 +182,7 @@ def to_string(self):

fmt_values = self._get_formatted_values()

result = ["{i}".format(i=i) for i in fmt_values]
result = ["{i}".format(i=i) for i in fmt_values] # type: Union[str, List[str]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be cleaner if this were just List[str] and then on 187 use a different variable name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored.

def __init__(
self,
categorical: ABCCategorical,
buf: Optional[StringIO] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but I think should use typing.TextIO here instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, but it's not yet failing mypy, so doing it 'gradually'. will change on next commit.

dtype: bool = True,
max_rows: Optional[int] = None,
min_rows: Optional[int] = None,
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove this for init - pseudo-standard internally not to add (though was required by earlier mypy versions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -237,13 +243,11 @@ def _chk_truncate(self):
else:
row_num = max_rows // 2
series = concat((series.iloc[:row_num], series.iloc[-row_num:]))
self.tr_row_num = row_num
else:
self.tr_row_num = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just not required anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. see OP.

@@ -498,9 +502,15 @@ def __init__(
else:
self.columns = frame.columns

self.truncate_h = None # type: Optional[int]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this gets assigned to an int subsequently and None isn't valid I think should just type as int.

There's some syntax trickery at play here due to py 3.5 support as in py36 forward you could declare this without initializing I think like:

self.truncate_h: int
self.truncate_v: int

But can't do that in 3.5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is valid, so that we get the same behaviour as False when it was bool

None is when the DataFrame is not truncated. the assigned int is the truncated row/column number.

frame = self.frame
if truncate_h:
if max_cols_adj == 0:
col_num = len(frame.columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not a valid path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate_h is defined on L548. if max_cols_adj==0, truncate_h is False?

@simonjayhawkins
Copy link
Member Author

@WillAyd to reduce the diff and confusion, i've added tr_row_num (and tr_col_num) back in, but defined truncate_v ( and truncate_h) using tr_row_num (and tr_col_num) to appease the static type checking

@@ -217,16 +223,19 @@ def __init__(

self._chk_truncate()

def _chk_truncate(self):
@property
def truncate_v(self) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to excuse my lack of knowledge with this module but this isn't entirely consistent with the previous logic right? Any risk in implementing this differently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed properties

@@ -32,6 +33,8 @@
is_timedelta64_dtype,
)
from pandas.core.dtypes.generic import (
ABCCategorical,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker but generally have been moving away from using the ABC classes in annotations as they don't actually represent anything statically and just get replaced with Any.

Does importing the actual object impact things negatively here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed ABCs

@simonjayhawkins simonjayhawkins changed the title add some type annotations and refactor io/formats/format.py add some type annotations io/formats/format.py Jul 20, 2019
@simonjayhawkins simonjayhawkins removed Clean Refactor Internal refactoring of code labels Jul 20, 2019
@jreback jreback added this to the 1.0 milestone Jul 20, 2019
@jreback
Copy link
Contributor

jreback commented Jul 20, 2019

lgtm. @WillAyd

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Minor stylistic nits and a question but otherwise lgtm

@@ -227,6 +238,7 @@ def _chk_truncate(self):
truncate_v = max_rows and (len(self.series) > max_rows)
series = self.series
if truncate_v:
max_rows = cast(int, max_rows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? Generally we try to avoid cast and I'm surprised MyPy can't narrow the type down here from None.

If it's fully required does making line 238 say max_rows is not None and ... get around that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's fully required does making line 238 say max_rows is not None and ... get around that?

no. gives pandas\io\formats\format.py:249: error: Unsupported operand types for // ("None" and "int") and there might be risk in implementing this differently anyway?

Generally we try to avoid cast

well I would have agreed with you when I started this, but after this PR, I think that adding type annotations should be just that and any refactoring or code cleanup should be deferred to a follow on PR.

I'm surprised MyPy can't narrow the type down here from None.

Agreed. hence the numerous iterations to try and avoid cast originally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK yea seems like a bug / gap with mypy inference then. Can always come back to clean up

@@ -79,7 +79,7 @@ def row_levels(self) -> int:
# not showing (row) index
return 0

def _get_columns_formatted_values(self) -> ABCIndex:
def _get_columns_formatted_values(self) -> Iterable[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just make this Iterable - if you don't subscript the generics they default to inclusive of Any anyway

@@ -162,7 +162,7 @@ def _write_cell(

def write_tr(
self,
line: List[str],
line: Iterable[Any],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here for Iterable

@WillAyd WillAyd merged commit 88ccb25 into pandas-dev:master Jul 21, 2019
@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2019

Thanks @simonjayhawkins !

@simonjayhawkins
Copy link
Member Author

thanks @WillAyd for the review.

@simonjayhawkins simonjayhawkins deleted the format-partial-typing-and-refactor branch July 22, 2019 11:48
quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants