Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always split "complex" comprehensions into multiple lines #2841

Closed
freakboy3742 opened this issue Jan 31, 2022 · 3 comments
Closed

Always split "complex" comprehensions into multiple lines #2841

freakboy3742 opened this issue Jan 31, 2022 · 3 comments
Labels
F: linebreak How should we split up lines? R: duplicate This issue or pull request already exists T: style What do we want Blackened code to look like?

Comments

@freakboy3742
Copy link

Describe the style change

"Complex" comprehensions (for some measure of "complex") should always be treated as multiline expressions, analogous to long lists/dicts that won't fit on a single line. "Simple" comprehensions continue to be rendered as a single line.

I appreciate that "simple" and "complex" are handwave-y statements; but as a starting point for discussion, I'll assert a comprehension ceases to be "simple" if:

  • It has more than one attribute access in any single term of the overall expression
  • It has more than one operator in any single term of the overall expression
  • A single term in the expression contains both an attribute access and an operator
  • The entire expression has a length exceeding 40 chars
  • It has an if clause
  • It has more than 1 for clause

I won't claim this is a perfect definition of "simple", but it's a starting point.

Why make this change? As I understand it, one of the goals of Black is to produce small, easily readable diffs on future edits of the code. The simplest list/set comprehension is a dense expression containing at least 3 sub-expressions; the simplest dict comprehension has at least 4 sub-expressions. Each one of those sub-expressions could potentially be the subject of a meaningful code change. If there is any complexity in those terms, the code (and, by extension, a diff of that code) immediately becomes harder to read.

A straightforward key-value reversing comprehension:

{v: k for k, v in mydata.items()}

is nearing the simplest possible dictionary expression you can write, but to my eyes is also nearing the practical limit of what can be reasonably mentally consumed as a single line (especially if that expression is inlined as an argument to a function or something similar).

The slightly more complex expression:

{v.upper(): k.upper() for k, v in mydata.items()}

would exceed the 40 character "complexity" limit, and be transformed into:

{
    v.upper(): k.upper() 
    for k, v in mydata.items()
}

As an aside, this simplifies some cases that currently cause problems for Black. If a test assertion is a comparison of 2 comprehensions, each individual comprehension may not trigger the line length limit, but the overall expression will. Black currently takes the approach of splitting the second comprehension onto a separate line in a way that introduces problems of "symmetry" (This was the originally reported on #733). This change removes any question of "symmetry", and reduces the problem in #733 to one of whether the individual comprehensions are "complex"; if they are, whitespace improves readability.

Examples in the current Black style

# Simple cases
e1 = [value.upper() for value in mydata]

e2 = {k: v for k, v in mydata.items()}

assert [asdict(y) for y in getattr(y, param)] == [y for y in x[param]]

# Complex cases
e3 = [convert_value(value.upper()) for value in mydata if is_valid_value(value.upper())]

e4 = {convert_value(v) for subdata in mydata for v in subdata if is_valid_value(v)}

e5 = {convert_key(k): convert_value(v) for k, v in mydata.items() if is_valid_value(v)}

assert [asdict(y) for x in big for y in getattr(x, param)] == [
    y for x in big_j for y in x[param]
]

Desired style

# Simple cases
e1 = [value.upper() for value in mydata]

e2 = {k: v for k, v in mydata.items()}

assert [asdict(y) for y in getattr(y, param)] == [y for y in x[param]]

# Complex cases
e3 = [
    convert_value(value.upper())
    for value in mydata
    if is_valid_value(value.upper())
]

e4 = {
    convert_value(v)
    for subdata in mydata
    for v in subdata
    if is_valid_value(v)
}

e5 = {
    convert_key(k): convert_value(v)
    for k, v in mydata.items()
    if is_valid_value(v)
}

assert [
    asdict(y)
    for x in big
    for y in getattr(x, param)
] == [
    y
    for x in big_j
    for y in x[param]
]
    

Additional context

This proposal follows on from a discussion that is part of #733.

@freakboy3742 freakboy3742 added the T: style What do we want Blackened code to look like? label Jan 31, 2022
@JelleZijlstra JelleZijlstra added the F: linebreak How should we split up lines? label Jan 31, 2022
@felix-hilden
Copy link
Collaborator

felix-hilden commented Jan 31, 2022

Hi! Thanks for submitting. I think there's something to this request, even though it would be quite hard to define a good heuristic for "complex". But I'm not sure if the improvements are worth the misses yet, so let's discuss. I have some issues with the definition you proposed in some common cases:

More than one attribute access in any single term of the overall expression

[name.title().replace("-", " ") for name in names] - seems pretty simple to read and a common occurrence

More than one operator in any single term of the overall expression

[path / "docs" / "build" / "HTML" for path in projects] - again, seems simple enough

Single term in the expression contains both an attribute access and an operator

In a similar vein to above, I think there are many possible situations where I'd consider this simple. For example: [path.parent / "folder" ...]

The entire expression has a length exceeding 40 chars

I think if we have just some slightly long names (which are arguably often better because they carry more meaning), 40 chars is over quite quickly. Just a multiplication on minimal names takes 31 chars: [a * b for a, b in zip(la, lb)]. So having longer names is immediately too much, nevermind having an assignment.

Has an if clause

I don't quite agree, because a comprehension can be used as a substitute for filter in quite simple cases: [thing.attr for thing in things if isinstance(thing, Thing)]

It has more than 1 for clause

This seems quite reasonable, although one quite common use case is flattening a nested list: [item for sublist in nested for item in sublist].

I realise I'm reading your proposal here like the Devil reads a Bible, even though I tried to provide reasonable examples. They are a good starting point though! So thank you for coming up with quite a varied list of requirements 👍 I think overall they should be made stricter though, meaning less cases of "complex". But I'm not sure how. Maybe if multiple conditions apply simultaneously..?

Perhaps instead of discussing over every detail, we could see what the effect of this would be on diff-shades (our tool which gauges changes between commits that is automatically run on PRs). That would mean some work already, so maybe some discussion is in order. But particularly if you'd like to contribute, seeing the changes across a bunch of projects would be very helpful.


The proposal and its implementation should also be evaluated for performance, but at this stage it's not really as important.

@hukkin
Copy link
Contributor

hukkin commented Jan 31, 2022

I think this issue is a more thought-out dupe of #2121

I agree with Felix's points, the heuristics laid out in the original post are probably too eager to split into multiple lines.

One heuristic that I have yet to see a good counter example to is this, i.e. if there's an else clause in the comprehension, then explode. An else clause implies a preceding if so the comprehension is IMO always complex enough to warrant exploding.

@felix-hilden
Copy link
Collaborator

Nice catch! I think we'll continue the discussion there. I'll include a tl;dr of this discussion as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F: linebreak How should we split up lines? R: duplicate This issue or pull request already exists T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests

4 participants