Always split "complex" comprehensions into multiple lines #2841

freakboy3742 · 2022-01-31T04:46:55Z

Describe the style change

"Complex" comprehensions (for some measure of "complex") should always be treated as multiline expressions, analogous to long lists/dicts that won't fit on a single line. "Simple" comprehensions continue to be rendered as a single line.

I appreciate that "simple" and "complex" are handwave-y statements; but as a starting point for discussion, I'll assert a comprehension ceases to be "simple" if:

It has more than one attribute access in any single term of the overall expression
It has more than one operator in any single term of the overall expression
A single term in the expression contains both an attribute access and an operator
The entire expression has a length exceeding 40 chars
It has an if clause
It has more than 1 for clause

I won't claim this is a perfect definition of "simple", but it's a starting point.

Why make this change? As I understand it, one of the goals of Black is to produce small, easily readable diffs on future edits of the code. The simplest list/set comprehension is a dense expression containing at least 3 sub-expressions; the simplest dict comprehension has at least 4 sub-expressions. Each one of those sub-expressions could potentially be the subject of a meaningful code change. If there is any complexity in those terms, the code (and, by extension, a diff of that code) immediately becomes harder to read.

A straightforward key-value reversing comprehension:

{v: k for k, v in mydata.items()}

is nearing the simplest possible dictionary expression you can write, but to my eyes is also nearing the practical limit of what can be reasonably mentally consumed as a single line (especially if that expression is inlined as an argument to a function or something similar).

The slightly more complex expression:

{v.upper(): k.upper() for k, v in mydata.items()}

would exceed the 40 character "complexity" limit, and be transformed into:

{
    v.upper(): k.upper() 
    for k, v in mydata.items()
}

As an aside, this simplifies some cases that currently cause problems for Black. If a test assertion is a comparison of 2 comprehensions, each individual comprehension may not trigger the line length limit, but the overall expression will. Black currently takes the approach of splitting the second comprehension onto a separate line in a way that introduces problems of "symmetry" (This was the originally reported on #733). This change removes any question of "symmetry", and reduces the problem in #733 to one of whether the individual comprehensions are "complex"; if they are, whitespace improves readability.

Examples in the current Black style

# Simple cases
e1 = [value.upper() for value in mydata]

e2 = {k: v for k, v in mydata.items()}

assert [asdict(y) for y in getattr(y, param)] == [y for y in x[param]]

# Complex cases
e3 = [convert_value(value.upper()) for value in mydata if is_valid_value(value.upper())]

e4 = {convert_value(v) for subdata in mydata for v in subdata if is_valid_value(v)}

e5 = {convert_key(k): convert_value(v) for k, v in mydata.items() if is_valid_value(v)}

assert [asdict(y) for x in big for y in getattr(x, param)] == [
    y for x in big_j for y in x[param]
]

Desired style

# Simple cases
e1 = [value.upper() for value in mydata]

e2 = {k: v for k, v in mydata.items()}

assert [asdict(y) for y in getattr(y, param)] == [y for y in x[param]]

# Complex cases
e3 = [
    convert_value(value.upper())
    for value in mydata
    if is_valid_value(value.upper())
]

e4 = {
    convert_value(v)
    for subdata in mydata
    for v in subdata
    if is_valid_value(v)
}

e5 = {
    convert_key(k): convert_value(v)
    for k, v in mydata.items()
    if is_valid_value(v)
}

assert [
    asdict(y)
    for x in big
    for y in getattr(x, param)
] == [
    y
    for x in big_j
    for y in x[param]
]

Additional context

This proposal follows on from a discussion that is part of #733.

The text was updated successfully, but these errors were encountered:

felix-hilden · 2022-01-31T08:59:27Z

Hi! Thanks for submitting. I think there's something to this request, even though it would be quite hard to define a good heuristic for "complex". But I'm not sure if the improvements are worth the misses yet, so let's discuss. I have some issues with the definition you proposed in some common cases:

More than one attribute access in any single term of the overall expression

[name.title().replace("-", " ") for name in names] - seems pretty simple to read and a common occurrence

More than one operator in any single term of the overall expression

[path / "docs" / "build" / "HTML" for path in projects] - again, seems simple enough

Single term in the expression contains both an attribute access and an operator

In a similar vein to above, I think there are many possible situations where I'd consider this simple. For example: [path.parent / "folder" ...]

The entire expression has a length exceeding 40 chars

I think if we have just some slightly long names (which are arguably often better because they carry more meaning), 40 chars is over quite quickly. Just a multiplication on minimal names takes 31 chars: [a * b for a, b in zip(la, lb)]. So having longer names is immediately too much, nevermind having an assignment.

Has an if clause

I don't quite agree, because a comprehension can be used as a substitute for filter in quite simple cases: [thing.attr for thing in things if isinstance(thing, Thing)]

It has more than 1 for clause

This seems quite reasonable, although one quite common use case is flattening a nested list: [item for sublist in nested for item in sublist].

I realise I'm reading your proposal here like the Devil reads a Bible, even though I tried to provide reasonable examples. They are a good starting point though! So thank you for coming up with quite a varied list of requirements 👍 I think overall they should be made stricter though, meaning less cases of "complex". But I'm not sure how. Maybe if multiple conditions apply simultaneously..?

Perhaps instead of discussing over every detail, we could see what the effect of this would be on diff-shades (our tool which gauges changes between commits that is automatically run on PRs). That would mean some work already, so maybe some discussion is in order. But particularly if you'd like to contribute, seeing the changes across a bunch of projects would be very helpful.

The proposal and its implementation should also be evaluated for performance, but at this stage it's not really as important.

hukkin · 2022-01-31T10:58:24Z

I think this issue is a more thought-out dupe of #2121

I agree with Felix's points, the heuristics laid out in the original post are probably too eager to split into multiple lines.

One heuristic that I have yet to see a good counter example to is this, i.e. if there's an else clause in the comprehension, then explode. An else clause implies a preceding if so the comprehension is IMO always complex enough to warrant exploding.

felix-hilden · 2022-01-31T11:08:49Z

Nice catch! I think we'll continue the discussion there. I'll include a tl;dr of this discussion as well.

freakboy3742 added the T: style What do we want Blackened code to look like? label Jan 31, 2022

freakboy3742 mentioned this issue Jan 31, 2022

comprehensions poor formatting #733

Closed

JelleZijlstra added the F: linebreak How should we split up lines? label Jan 31, 2022

felix-hilden closed this as completed Jan 31, 2022

felix-hilden added the R: duplicate This issue or pull request already exists label Jan 31, 2022

felix-hilden mentioned this issue Jan 31, 2022

Add linebreak in comprehension with complex element expression #2121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always split "complex" comprehensions into multiple lines #2841

Always split "complex" comprehensions into multiple lines #2841

freakboy3742 commented Jan 31, 2022

felix-hilden commented Jan 31, 2022 •

edited

Loading

hukkin commented Jan 31, 2022

felix-hilden commented Jan 31, 2022

Always split "complex" comprehensions into multiple lines #2841

Always split "complex" comprehensions into multiple lines #2841

Comments

freakboy3742 commented Jan 31, 2022

felix-hilden commented Jan 31, 2022 • edited Loading

hukkin commented Jan 31, 2022

felix-hilden commented Jan 31, 2022

felix-hilden commented Jan 31, 2022 •

edited

Loading