Skip to content

[ML] Reformat code with Java brace style #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

[ML] Reformat code with Java brace style #10

wants to merge 6 commits into from

Conversation

edsavage
Copy link
Contributor

Provide scripts to reformat C++ source code to be consistent with the
Java style of brace formatting (attached braces) and to verify that
style after future code changes.

These scripts rely on astyle - available from https://sourceforge.net/projects/astyle/files/astyle/

Some outstanding questions:

  • Should astyle be sought in a fixed location, rather than searching the path?
  • Should we instead be building astyle ourselves from source?
  • Should astyle options be moved to a configuration file?

Provide scripts to reformat C++ source code to be consistent with the
Java style of brace formatting (attached braces) and to verify that
style after future code changes.

These scripts rely on astyle - available from https://sourceforge.net/projects/astyle/files/astyle/

Reformatted code will feature in a separate commit.
Source code reformatted to be more consistent with Java brace style.
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hair's breadth away from being as good as could reasonably be expected for an automated code reformat.

The one remaining problem appears to be the lambdas, which others have complained about too:

How hard would it be for us to contribute a fix to Astyle to make it format lambdas correctly?

The problem with lambdas also partly answers the question about whether we should build Astyle from source. Since we're not redistributing it I don't think it's essential that we build it from source, but we should mandate a specific version of it for each branch, and the easiest way to get a specific version on certain platforms may be to build it from source.

return std::make_pair(maths::CBasicStatistics::mean(meanError),
maths::CBasicStatistics::mean(meanErrorAt95));
};
core_t::TTime end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Astyle does not understand lambdas, and has formatted this inappropriately as a result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is remarkable ugly!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trick will be identify a lambda in the first place.

From the looks of it there is a fair amount of demand for the feature, if we did develop a solution then I'm sure our contribution would be extremely welcome.

The option of extreme last resort would be to disable formatting of the blocks containing lambdas by way of comment tags.

@@ -279,14 +252,12 @@ bool CPriorTestInterface::marginalLikelihoodMeanForTest(double &result) const

double a, b;
if ( !this->marginalLikelihoodQuantileForTest(0.001, eps, a)
|| !this->marginalLikelihoodQuantileForTest(99.999, eps, b))
{
|| !this->marginalLikelihoodQuantileForTest(99.999, eps, b)) {
Copy link
Contributor

@droberts195 droberts195 Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Astyle doesn't understand that the extra gap on the line with the if was to allow the || to be shifted back a bit. We're going to have to accept slight differences like this with any automated formatting tool. One solution would be to put the || at the ends of lines rather than the beginning. All the places where the "|| at beginning with extra whitespace on if line" format was previously used are going to look a bit weird until somebody manually changes them.

This isn't necessarily something that needs changing as part of this PR - I'm just flagging it for others to consider.

Does Astyle have an option to remove whitespace between if ( and what follows? If so that might make things a bit neater in the immediate aftermath of the reformat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a raft of options relating to padding. --unpad-paren looks a likely candidate but I'll need to experiment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an option to set the minimal number of indents for continuation lines. Setting it to 0 gives a result much closer to our existing style so I think that should be added anyway. In the case discussed above it would cause the boolean or to be aligned with the negation operator, closer to what we want but not exactly.

The solution would be as you say to always break after || and && operator. At the moment we do this inconsistently anyway.

Wrapped a perl script around astyle to provide a few tweaks to the
formatting.

In particular, I've added consistency in where we break conditional
lines. The convention now is to ensure that '||' and "&&' always
appear at the end of line, never the beginning. This allows for better
alignment by astyle.

Modified source files will follow in a separate commit
Source files pre-formatted with a script to ensure consistency with how
continuation lines are aligned.
@@ -276,8 +258,8 @@ void CTrendComponentTest::testDecayRate()
level.add(value);

controller.multiplier({prediction},
{{values[(time - start) / bucketLength] - prediction}},
bucketLength, 1.0, 0.012);
{{values[(time - start) / bucketLength] - prediction}},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does astyle get confused because of '{' ? It seems fine if initializer lists are not used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be this: steinwurf/astyle@fab103c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be this: https://sourceforge.net/p/astyle/bugs/441

It's claimed to be fixed in subversion, but presumably not in a released build yet.

The modern C++ syntax is incredibly hard to parse, as similar punctuation is used for multiple purposes and is highly context dependent. Think square brackets for both array indices and lambdas, braces for both initializer lists and code blocks. This is why I thought clang-format might be the best choice of formatter - it's built off the same codebase as the compiler, so by definition will understand the code as well as the compiler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that for lambdas clang-format works. Not sure about initializer lists.

'fixed in subversion' is strange, the bug has been updated 01/25, latest release is 02/26. Might be due to minor/major releases but I doubt that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll experiment with the SVN head revision as it looks like it might go someway to resolving the issues we're seeing.

It might also be worth re-visiting clang-format in light of what we now know of the limitations of astyle.

BOOST_MULTI_INDEX_CONST_TYPE_CONST_MEM_FUN(CFieldOptions, std::string, partitionFieldName)
>
>
>
Copy link
Contributor

@tveasey tveasey Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a shame we lose indenting in these sorts of typedef. The result is much harder to read. There isn't the possibility of issuing a preserve formatting for code block by any chance? This may be controversial, but it might get us out of difficulty without spending a lot of time getting the tooling to deal with edge cases.

true, // Ignore date words
false, // Ignore field names
2, // Min dictionary word length
core::CWordDictionary::TWeightVerbs5Other2>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, formatting of typedefs generally seems a little off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom.

Yes there is a mechanism for turning formatting off for particular regions of code. If we did go down that route then there's a few other areas where it might be useful as well - for example we use macros in switch statements in say lib/model/ModelTypes.cc that fool the formatter.

I've been experimenting also with clang_format and Uncrustify. The former has far more intimate knowledge of the language constructs as it is using the clang compiler library. Unfortunately the configuration for it is quite inflexible and doesn't give us what we want in a number of cases (such as the indentation levels in class definitions).

Uncrustify is actually looking much more promising than either astyle or clang-format. It seems to have an option for everything. It does for example handle the typedef indentation as desired, along with being able to cope with lambdas correctly. It is still of course duped by macros in switch statements though (again we'd have to locally turn off indentation formatting to cope)

I'll commit the reformatted code from each tool to its own branch for easy comparison..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good plan!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

such as the indentation levels in class definitions

I wouldn't stress too much about this in particular. If we're reformatting the whole codebase I don't think it really matters if we change this aspect of the formatting. Last time someone looked at clang-format I think we decided that this was the best option supported by clang-format for laying out class definitions:

class foo {
public:
    int bar();
private:
    int x;
};

Not sure what the other differences were though - maybe some of those are more problematic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the formatting of all the multi-line typedefs significantly less readable. It was just something that stood out on a quick skim. I think it would be nice to evaluate all the options, so we can make a sensible choice. We'll be stuck with this going forwards after all. The alternative is we do the bare minimum at the moment and literally just update braces.

@droberts195
Copy link
Contributor

Having seen #14 and #15 I think we should give up on Astyle. It's not worth spending time contributing fixes to Astyle when uncrustify can do everything Astyle can do and doesn't have the problems understanding the modern C++ language constructs. (Despite the flexibility of uncrustify I think there's still an argument to be made for clang-format, but that discussion can move to the other PRs.)

So unless somebody else objects I think this PR can be closed.

@edsavage
Copy link
Contributor Author

Closing the PR as per @droberts195 comments above.

Further discussion on #14 and #15 is welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants