-
Notifications
You must be signed in to change notification settings - Fork 65
[ML] Reformat code with Java brace style #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Provide scripts to reformat C++ source code to be consistent with the Java style of brace formatting (attached braces) and to verify that style after future code changes. These scripts rely on astyle - available from https://sourceforge.net/projects/astyle/files/astyle/ Reformatted code will feature in a separate commit.
Source code reformatted to be more consistent with Java brace style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hair's breadth away from being as good as could reasonably be expected for an automated code reformat.
The one remaining problem appears to be the lambdas, which others have complained about too:
How hard would it be for us to contribute a fix to Astyle to make it format lambdas correctly?
The problem with lambdas also partly answers the question about whether we should build Astyle from source. Since we're not redistributing it I don't think it's essential that we build it from source, but we should mandate a specific version of it for each branch, and the easiest way to get a specific version on certain platforms may be to build it from source.
return std::make_pair(maths::CBasicStatistics::mean(meanError), | ||
maths::CBasicStatistics::mean(meanErrorAt95)); | ||
}; | ||
core_t::TTime end) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Astyle does not understand lambdas, and has formatted this inappropriately as a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is remarkable ugly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick will be identify a lambda in the first place.
From the looks of it there is a fair amount of demand for the feature, if we did develop a solution then I'm sure our contribution would be extremely welcome.
The option of extreme last resort would be to disable formatting of the blocks containing lambdas by way of comment tags.
lib/maths/unittest/TestUtils.cc
Outdated
@@ -279,14 +252,12 @@ bool CPriorTestInterface::marginalLikelihoodMeanForTest(double &result) const | |||
|
|||
double a, b; | |||
if ( !this->marginalLikelihoodQuantileForTest(0.001, eps, a) | |||
|| !this->marginalLikelihoodQuantileForTest(99.999, eps, b)) | |||
{ | |||
|| !this->marginalLikelihoodQuantileForTest(99.999, eps, b)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Astyle doesn't understand that the extra gap on the line with the if
was to allow the ||
to be shifted back a bit. We're going to have to accept slight differences like this with any automated formatting tool. One solution would be to put the ||
at the ends of lines rather than the beginning. All the places where the "||
at beginning with extra whitespace on if
line" format was previously used are going to look a bit weird until somebody manually changes them.
This isn't necessarily something that needs changing as part of this PR - I'm just flagging it for others to consider.
Does Astyle have an option to remove whitespace between if (
and what follows? If so that might make things a bit neater in the immediate aftermath of the reformat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a raft of options relating to padding. --unpad-paren
looks a likely candidate but I'll need to experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an option to set the minimal number of indents for continuation lines. Setting it to 0 gives a result much closer to our existing style so I think that should be added anyway. In the case discussed above it would cause the boolean or to be aligned with the negation operator, closer to what we want but not exactly.
The solution would be as you say to always break after ||
and &&
operator. At the moment we do this inconsistently anyway.
Wrapped a perl script around astyle to provide a few tweaks to the formatting. In particular, I've added consistency in where we break conditional lines. The convention now is to ensure that '||' and "&&' always appear at the end of line, never the beginning. This allows for better alignment by astyle. Modified source files will follow in a separate commit
Source files pre-formatted with a script to ensure consistency with how continuation lines are aligned.
@@ -276,8 +258,8 @@ void CTrendComponentTest::testDecayRate() | |||
level.add(value); | |||
|
|||
controller.multiplier({prediction}, | |||
{{values[(time - start) / bucketLength] - prediction}}, | |||
bucketLength, 1.0, 0.012); | |||
{{values[(time - start) / bucketLength] - prediction}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does astyle get confused because of '{' ? It seems fine if initializer lists are not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be this: steinwurf/astyle@fab103c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be this: https://sourceforge.net/p/astyle/bugs/441
It's claimed to be fixed in subversion, but presumably not in a released build yet.
The modern C++ syntax is incredibly hard to parse, as similar punctuation is used for multiple purposes and is highly context dependent. Think square brackets for both array indices and lambdas, braces for both initializer lists and code blocks. This is why I thought clang-format might be the best choice of formatter - it's built off the same codebase as the compiler, so by definition will understand the code as well as the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that for lambdas clang-format works. Not sure about initializer lists.
'fixed in subversion' is strange, the bug has been updated 01/25, latest release is 02/26. Might be due to minor/major releases but I doubt that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll experiment with the SVN head revision as it looks like it might go someway to resolving the issues we're seeing.
It might also be worth re-visiting clang-format in light of what we now know of the limitations of astyle.
BOOST_MULTI_INDEX_CONST_TYPE_CONST_MEM_FUN(CFieldOptions, std::string, partitionFieldName) | ||
> | ||
> | ||
> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a shame we lose indenting in these sorts of typedef. The result is much harder to read. There isn't the possibility of issuing a preserve formatting for code block by any chance? This may be controversial, but it might get us out of difficulty without spending a lot of time getting the tooling to deal with edge cases.
true, // Ignore date words | ||
false, // Ignore field names | ||
2, // Min dictionary word length | ||
core::CWordDictionary::TWeightVerbs5Other2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, formatting of typedefs generally seems a little off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tom.
Yes there is a mechanism for turning formatting off for particular regions of code. If we did go down that route then there's a few other areas where it might be useful as well - for example we use macros in switch statements in say lib/model/ModelTypes.cc that fool the formatter.
I've been experimenting also with clang_format and Uncrustify. The former has far more intimate knowledge of the language constructs as it is using the clang compiler library. Unfortunately the configuration for it is quite inflexible and doesn't give us what we want in a number of cases (such as the indentation levels in class definitions).
Uncrustify is actually looking much more promising than either astyle or clang-format. It seems to have an option for everything. It does for example handle the typedef indentation as desired, along with being able to cope with lambdas correctly. It is still of course duped by macros in switch statements though (again we'd have to locally turn off indentation formatting to cope)
I'll commit the reformatted code from each tool to its own branch for easy comparison..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good plan!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
such as the indentation levels in class definitions
I wouldn't stress too much about this in particular. If we're reformatting the whole codebase I don't think it really matters if we change this aspect of the formatting. Last time someone looked at clang-format I think we decided that this was the best option supported by clang-format for laying out class definitions:
class foo {
public:
int bar();
private:
int x;
};
Not sure what the other differences were though - maybe some of those are more problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the formatting of all the multi-line typedefs significantly less readable. It was just something that stood out on a quick skim. I think it would be nice to evaluate all the options, so we can make a sensible choice. We'll be stuck with this going forwards after all. The alternative is we do the bare minimum at the moment and literally just update braces.
Having seen #14 and #15 I think we should give up on Astyle. It's not worth spending time contributing fixes to Astyle when uncrustify can do everything Astyle can do and doesn't have the problems understanding the modern C++ language constructs. (Despite the flexibility of uncrustify I think there's still an argument to be made for clang-format, but that discussion can move to the other PRs.) So unless somebody else objects I think this PR can be closed. |
Closing the PR as per @droberts195 comments above. |
Provide scripts to reformat C++ source code to be consistent with the
Java style of brace formatting (attached braces) and to verify that
style after future code changes.
These scripts rely on astyle - available from https://sourceforge.net/projects/astyle/files/astyle/
Some outstanding questions: