Skip to content

Conversation

@knz
Copy link
Contributor

@knz knz commented Jul 6, 2018

This is not yet in the domain of "bikeshedding" -- more alike to "general clean-up"

Cumulative perf improvements:

name           old time/op    new time/op    delta
PrettyData-16     4.48s ±21%     1.77s ±26%  -60.44%  (p=0.008 n=5+5)
PrettyData-16    3.20GB ± 2%    0.62GB ± 1%  -80.56%  (p=0.008 n=5+5)
PrettyData-16     10.5M ± 0%      4.2M ± 0%  -59.67%  (p=0.008 n=5+5)

@knz knz requested review from a team and madelynnblue July 6, 2018 12:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20180706-pp-improvs branch 2 times, most recently from c904bbd to 4389056 Compare July 6, 2018 14:15
Copy link
Contributor

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great. Thanks.

@knz knz force-pushed the 20180706-pp-improvs branch 4 times, most recently from 550ce36 to 6bdba1f Compare July 6, 2018 19:52
knz added 8 commits July 6, 2018 22:43
The formatting on long lines remains unchanged.

When width-constrained, before:

```
---------------- width -----------------|
SELECT                                --|
    CASE x			      --|
    WHEN 'a'			      --|
    THEN 'aaaa'			      --|
    WHEN 'b'			      --|
    THEN 'bbbb'			      --|
    ELSE 'ccccc'		      --|
    END                               --|
```

After:

```
---------------- width -----------------|
SELECT                                --|
    CASE x			      --|
    WHEN 'a' THEN 'aaaa'	      --|
    WHEN 'b' THEN 'bbbb'	      --|
    ELSE 'ccccc'		      --|
    END                               --|
```

i.e. this patch advantages placing the WHEN and THEN on the same line.

Release note: None
Two fixes:

1) multi-operand expressions are flattened.

Before:

```
SELECT (1 AND 2) AND 3
```

After:

```
SELECT 1 AND 2 AND 3
```

2) the operands are aligned.

Before:

```
SELECT
    (
        (
            (
                (
                    (
                        (
                            (1 AND 2)
                            AND 3
                        )
                        AND 4
                    )
                    AND 5
                )
                AND 6
            )
            AND 7
        )
        AND 8
    )
    AND 9
```

After:

```
SELECT
    1
    AND 2
    AND 3
    AND 4
    AND 5
    AND 6
    AND 7
    AND 8
    AND 9
```

Note that this is careful to preserve operator precedence; compare:

```
-- this input:
select 1 and 2 and 3 or 4 and 5 and 6 or 7 and 8 and 9
-- is transformed to this:
SELECT
    (1 AND 2 AND 3)
    OR (4 AND 5 AND 6)
    OR (7 AND 8 AND 9)
```

```
-- this input:
select 1 or 2 or 3 and 4 or 5 or 6 and 7 or 8 or 9
-- is transformed to this:
SELECT
    1
    OR 2
    OR (3 AND 4)
    OR 5
    OR (6 AND 7)
    OR 8
    OR 9
```

Release note: None
When the "simplify" option is set, parentheses around scalar
sub-expressions are removed automatically when the result is
grammatically unambiguous and equivalent to the original syntax.

Before:

```
SELECT (a + b) * (c * d);
```

After:

```
SELECT (a + b) * c * d;
```

Or, more convincingly, from before:

```
SELECT
    (
        (
            (
                (
                    (
                        (
                            ((CASE WHEN rolcanlogin THEN '-- User: ' ELSE '-- Role: ' END || quote_ident(rolname)) || e'\\n-- DROP ')
                            || CASE WHEN rolcanlogin THEN 'USER ' ELSE 'ROLE ' END
                        )
                        || quote_ident(rolname)
                    )
                    || e';\\n\\nCREATE '
                )
                || CASE WHEN rolcanlogin THEN 'USER ' ELSE 'ROLE ' END
            )
            || quote_ident(rolname)
        )
        || e' WITH\\n  '
    )
```

To after:

```
SELECT
    CASE WHEN rolcanlogin THEN '-- User: ' ELSE '-- Role: ' END || quote_ident(rolname) || e'\\n-- DROP ' || CASE WHEN rolcanlogin THEN 'USER ' ELSE 'ROLE ' END
    || quote_ident(rolname)
    || e';\\n\\nCREATE '
    || CASE WHEN rolcanlogin THEN 'USER ' ELSE 'ROLE ' END
    || quote_ident(rolname)
    || e' WITH\\n  '
```

Release note: None
Before this patch, under width constraints:

```
SELECT
    10
    +
    (
        count
        *
        60
        *
        24
    )
```

After:

```
SELECT
    10
    + (
        count
        * 60
        * 24
    )
```

i.e. yank the right operand on the same line as the operator, for
readability.

Release note: None
Prior to this patch the following statement was surprisingly (=
incorrectly) formatted:

```
-------- width ----------|
SELECT *               --|
        FROM kv	       --|
        WHERE v = 3    --|
    UNION	       --|
        SELECT *       --|
        FROM kv	       --|
        WHERE k = 2    --|
UNION		       --|
    SELECT *	       --|
    FROM kv	       --|
    WHERE (k + v) < 2  --|
```

Note how the code was meant to indent both the left and right operand
to UNION relative to the operator, but yet the first SELECT line is
yanked to the left.

The root of the problem is that the formatting algorithm does not
support indenting a nested document if there is nothing before
it. This is the key difference between P. Wadler's idea (used here)
and the original algorithm by J. Hugues, see section 5 of Wadler's
note at
https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf.

This patch fixes the issue by de-indenting the left operand. This adds
the benefit that an expression of the form (a union b) union c has its
b and c operands properly aligned.

After:

```
-------- width ----------|
SELECT *               --|
FROM kv		       --|
WHERE v = 3	       --|
UNION		       --|
    SELECT *	       --|
    FROM kv	       --|
    WHERE k = 2	       --|
UNION		       --|
    SELECT *	       --|
    FROM kv	       --|
    WHERE (k + v) < 2  --|
```

Release note: None
On long lines, before:

```
--------------------------------- width ---------------------------------------|
SELECT *                                                                     --|
FROM                                                                         --|
    kv INNER JOIN blah ON kv.x = blah.y INNER JOIN foo ON blah.x = foo.z     --|
        INNER JOIN bzz ON bzz.z = foo.z,                                     --|
    blah INNER JOIN blih ON blah.x = blih.y                                  --|
```

After:

```
--------------------------------- width ---------------------------------------|
SELECT *                                                                     --|
FROM                                                                         --|
    kv                                                                       --|
    INNER JOIN blah ON kv.x = blah.y                                         --|
    INNER JOIN foo ON blah.x = foo.z                                         --|
    INNER JOIN bzz ON bzz.z = foo.z,                                         --|
    blah                                                                     --|
    INNER JOIN blih ON blah.x = blih.y                                       --|
```

i.e. each join operand on its own line.

Width constrained, before:

```
----------- width ------------|
SELECT *                    --|
FROM                        --|
    kv                      --|
        INNER JOIN blah     --|
        ON kv.x = blah.y    --|
        INNER JOIN foo      --|
        ON blah.x = foo.z   --|
        INNER JOIN bzz      --|
        ON bzz.z = foo.z    --|
```

After:

```
----------- width ------------|
SELECT *                    --|
FROM                        --|
    kv                      --|
    INNER JOIN blah         --|
    ON kv.x = blah.y        --|
    INNER JOIN foo          --|
    ON blah.x = foo.z       --|
    INNER JOIN bzz          --|
    ON bzz.z = foo.z        --|
```

i.e. the various join operands are vertically aligned; there is no
special alignment emphasis on the first operand of a join.

Release note: None
If there is just one operand in ROWS FROM, and it is a function
application, then it can be elided.

Release note: None
When simplifying, this strips superfluous parentheses in FROM clauses.

Before:

```
SELECT *
FROM
    (a CROSS JOIN b),
    (
        a
            CROSS JOIN
                (
                    c
                        CROSS JOIN
                            d
                )
    ),
    d,
    (TABLE e ORDER BY f)
```

After:

```
SELECT *
FROM
    a
    CROSS JOIN b,
    a
    CROSS JOIN
        (c CROSS JOIN d),
    d,
    (TABLE e ORDER BY f)
```

Release note: None
@knz knz force-pushed the 20180706-pp-improvs branch 2 times, most recently from 17f5a3e to 8f059b6 Compare July 6, 2018 21:06
knz added 2 commits July 6, 2018 23:18
This indents the right operand if it needs to be wrapped, and
conditionally removes some extraneous parentheses.

Before:

```
------ width -------|
SELECT            --|
    1 = 1,	  --|
    (1 + 2)	  --|
    = (3 * foo(x)),
    ((x))	  --|
    = (y::INT),	  --|
    (y[123])	  --|
    = (min(z))    --|
```

After:

```
------ width -------|
SELECT            --|
    1 = 1,	  --|
    (1 + 2)       --|
    = (3 * foo(x)),
    x = y::INT,   --|
    y[123] = min(z)
```

Release note: None
Before:

```
UPDATE foo
SET
    x
    =
    (3 + 2),
    (y, z)
    =
    (
        4
        +
        min(1 - 2),
        123
    )
```

After:

```
UPDATE foo
SET
    x = 3 + 2,
    (y, z)
        = (
            4
            + min(
                    1
                    - 2
                ),
            123
        )
```

i.e. the assignments are shifted to the left and, if possible, the LHS
and RHS of each assignment are grouped on the same line.

Release note: None
@knz knz force-pushed the 20180706-pp-improvs branch from 8f059b6 to 8bec96f Compare July 6, 2018 21:18
@knz
Copy link
Contributor Author

knz commented Jul 6, 2018

and there we go

bors r+

craig bot pushed a commit that referenced this pull request Jul 6, 2018
27225: pretty: miscellaneous formatting improvements r=knz a=knz

This is not yet in the domain of "bikeshedding" -- more alike to "general clean-up"

Cumulative perf improvements:

```
name           old time/op    new time/op    delta
PrettyData-16     4.48s ±21%     1.77s ±26%  -60.44%  (p=0.008 n=5+5)
PrettyData-16    3.20GB ± 2%    0.62GB ± 1%  -80.56%  (p=0.008 n=5+5)
PrettyData-16     10.5M ± 0%      4.2M ± 0%  -59.67%  (p=0.008 n=5+5)
```


Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jul 6, 2018

Build succeeded

@craig craig bot merged commit 8bec96f into cockroachdb:master Jul 6, 2018
@knz knz deleted the 20180706-pp-improvs branch July 6, 2018 21:48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may conflict with the second commit from #27227

case *BinaryExpr:
parenPrio := binaryOpPrio[op]
childPrio := binaryOpPrio[te.Operator]
if te.Operator == op && childPrio <= parenPrio {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If te.Operator == op then won't childPrio always equal parenPrio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I had broken this inadvertently. See #27255.

Copy link
Contributor

@madelynnblue madelynnblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved this PR and then it changed a bunch, so here are the rest of my comments.

// Fill concatenates a list of docs with spaces if possible,
// otherwise with newlines.
// See linked paper pp 14-15.
// func Fill(d ...Doc) Doc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this debug work? Probably wasn't supposed to be in this commit but whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had tried it out but then commented it out to please the linter. I think it's a nice function but we haven't used it. I was happy while writing this to see how easy it was to reproduce the algorithms!

if p.Simplify {
e = StripParens(e)
}
return pretty.Group(p.nestName(d, pretty.ConcatSpace(pretty.Text("="), p.Doc(e))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nestName calls Group on its outer Doc, so the call to Group here is extraneous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you fixed it. Thanks!

craig bot pushed a commit that referenced this pull request Jul 8, 2018
27255: pretty: improve the simplification for binary scalar expressions r=knz a=knz

Noted by @mjibson in #27225 (comment)

This uses the precedence more to simplify binary expressions.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants