-
Notifications
You must be signed in to change notification settings - Fork 65
[ML] Reformat native code using clang-format #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Trial code reformatting using clang-format. Pros: * has intimate knowledge of modern C++ constructs e.g lambdas, braced initializers etc. Cons: * Inflexible config, most notably. o no way of opting in for a subset of features to format. It is all-or-nothing. o no way of indenting entire class headers. I've chosen to not indent access modifiers here. o Does horrible things to boost::program_options description lists c.f. bin/autoconfig/CCmdLineParser.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to #10 this has got lambdas, multi-line template typedefs and initializer lists correct.
I only looked at a small subset of the files, but overall I think it looks quite promising providing we accept that consistency is more important than minimising the amount of change from what we have now.
It's true that the formatting of boost::program_options
's long "function" chains is awful, but those are limited to one file per program, and apparently there's a simple way to tell clang-format to leave code alone so it wouldn't be too onerous to use this approach for those few sections: https://clang.llvm.org/docs/ClangFormatStyleOptions.html#disabling-formatting-on-a-piece-of-code
I think it would be nice to experiment with the "penalty" settings a little, and see if it is possible to improve the formatting of some of our longer LOG
statements. I noted a couple of examples of why this should be possible in my inline comments.
lib/core/COsFileFuncs_Windows.cc
Outdated
return res; | ||
} | ||
|
||
// To set st_ino, we have to briefly open the file | ||
HANDLE handle = CreateFile(path, | ||
0, // Open for neither read nor write | ||
0,// Open for neither read nor write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SpacesBeforeTrailingComments: 1
would look a bit nicer?
build-setup/strptime/private.h
Outdated
#define STD_INSPIRED 1 | ||
#define TM_GMTOFF tm_gmtoff | ||
#define TM_ZONE tm_zone | ||
#define STD_INSPIRED 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build-setup
sub-directory should be excluded from formatting like 3rd_party
is. It also contains 3rd party code.
.clang-format
Outdated
Priority: 3 | ||
- Regex: '.*' | ||
Priority: 1 | ||
IncludeIsMainRegex: '(Test)?$' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? For our files called XyzTest.cc
the main header is also XyzTest.h
, not Xyz.h
.
.clang-format
Outdated
IncludeCategories: | ||
- Regex: '^"(llvm|llvm-c|clang|clang-c)/' | ||
Priority: 2 | ||
- Regex: '^(<|"(gtest|gmock|isl|json)/)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure exactly what these do, but the patterns look inappropriate for our codebase.
.clang-format
Outdated
PenaltyBreakAssignment: 2 | ||
PenaltyBreakBeforeFirstCallParameter: 19 | ||
PenaltyBreakComment: 300 | ||
PenaltyBreakFirstLessLess: 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth experimenting with this to see how it changes the way large LOG
messages are formatted.
lib/core/COsFileFuncs.cc
Outdated
{ | ||
return ::dup2(fildes, fildes2); | ||
} | ||
int COsFileFuncs::dup2(int fildes, int fildes2) { return ::dup2(fildes, fildes2); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit weird to me in an implementation file.
Maybe we should use AllowShortFunctionsOnASingleLine: InlineOnly
?
lib/api/CAnomalyJob.cc
Outdated
template<typename T> | ||
inline std::string pairDebug(const T &t) | ||
{ | ||
template <typename T> inline std::string pairDebug(const T &t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AlwaysBreakTemplateDeclarations: true
would preserve the old formatting here. Does it make other parts of the code look much worse though?
lib/api/CAnomalyJob.cc
Outdated
this->debugPrintRecord(dataRowFields)); | ||
LOG_ERROR("Cannot interpret " << m_TimeFieldName | ||
<< " field in record:" << core_t::LINE_ENDING | ||
<< this->debugPrintRecord(dataRowFields)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of why I think it would be good to experiment with the PenaltyBreakFirstLessLess
setting. I don't think it looks too bad though formatted like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that inconsistent breaking in logging statements is one of the worst formatting decisions (although it also messes up readability of some maths expressions). I would say we should set this penalty as small as possible.
Also, in general some of the line breaking choices are off to my taste. It might be worth experimenting a bit with all the break penalties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed Tom - I'll experiment a little more with a few of the settings to see if I can make things a little more readable.
Were there any particular instances of 'bad' line breaking that you'd like improved if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I think the issue is that it works from the AST, so effectively eliminates all white space in the original code then has to reinsert. Although, as Dave says, this means it fully understands all language constructs it also tends to mess up any logical line breaking that was done in the original code. It is a shame there aren't more fine grained formatting controls short of having to insert don't format this guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does this even on comments too, which renders some carefully formatted, illustrative comments unreadable, see e.g. devbin/domain_name_entropy/CIpAddressTest.cc
where I've had to disable formatting around the 'TEST' comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the comment problem you could try ReflowComments: false
. I imagine we'll get away with this because the other formatting changes are moving stuff to the left rather than to the right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that does do the trick of maintaining internal spacing in the comments. Thanks Dave.
I'll keep formatting disabled in this instance though to prevent line breaking.
lib/api/CAnomalyJob.cc
Outdated
controlMessage[0] << '\''); | ||
LOG_WARN("Ignoring unknown control message of length " | ||
<< controlMessage.length() << " beginning with '" << controlMessage[0] | ||
<< '\''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of why I think changing the PenaltyBreakFirstLessLess
setting might cause other long LOG
messages to be formatted more sensibly.
lib/api/CAnomalyJob.cc
Outdated
if (!remainder.empty()) | ||
{ | ||
core::CStringUtils::tokenise( | ||
" ", controlMessage.substr(1, std::string::npos), tokens, remainder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is an argument for increasing PenaltyBreakBeforeFirstCallParameter
? Is there a particular reason for the current value of 19?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestions Dave.
I should have said that I used the output from clang-format -style=llvm -dump-config
as a starting point for the configuration so its probably got a good whack of options in there that we don't need - I'll now clean them up if appropriate.
It definitely looks like its worth experimenting with the PenaltyBreakFirstLessLess
setting as well.
There's a handful of place where disabling formatting entirely looks to be the only option - the program_options for one plus a few comments where internal spacing is important. I'll use it sparingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to use:
BasedOnStyle:llvm
and then overwrite what we like to change.
What was your reason to instead dump the whole config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was simply to easily see all the settings available and their default values for the llvm style..
As we're only tweaking a few settings though it would make sense to use the BasedOnStyle
approach
Certain files have key sections that are not formatting correctly. Disable formatting using comment blocks for those.
Addressed code review comments. Disabled formatting for a few critical sections.
Reformatted with AllowShortFunctionsOnASingleLine = InlineOnly in order that empty bodies have broken braces in implemantation files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple more suggestions, but I think this is looking pretty good now for such an all-or-nothing tool.
.clang-format
Outdated
ColumnLimit: 100 | ||
ConstructorInitializerAllOnOneLineOrOnePerLine: true | ||
FixNamespaceComments: false | ||
IndentCaseLabels: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tom prefers false
here, and has written more switch
statements than anyone else. Also, I think using false
here solves the problem of the multi-case
macros (such as CASE_INDIVIDUAL_COUNT
).
.clang-format
Outdated
IndentWidth: 4 | ||
PenaltyBreakBeforeFirstCallParameter: 1000 | ||
PenaltyBreakFirstLessLess: 1000 | ||
PointerAlignment: Left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SpaceAfterTemplateKeyword: false
would probably be more consistent with the majority of our existing template declarations (although we did have a few with a space).
Examining the behaviour of... IndentCaseLabels: false SpaceAfterTemplateKeyword: false
Increased the maximum line length to 120 characters. Adjusted the settings... PenaltyBreakBeforeFirstCallParameter: 4 PenaltyBreakFirstLessLess: 2 to obtain a consistent and reasonably pleasing layout of long log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole I think this is pretty good and better than #14 on balance. I think the biggest issues I now have with this, aside from some point cases where we can disable formatting, are:
- Formatting of log statements. Where did you get to experimenting with
PenaltyBreakFirstLessLess
Ed? Another option would be to add"" <<
at the start of all log statements so it at least gets the alignment sort of ok. (We should be able to script this change at the same time, unless people have strong objections.) - Its reflowing of long lines. There are lots of cases where you want to break up long lines based on semantics, but it has no way of knowing how to do this so tends to insert line breaks in odd places. This is the one big disadvantage of this versus uncrustify IMO and there is no way of controlling this behaviour.
- This configuration has a tendency to produce some very long lines, is ColumnLimit 120 a bit high?
Aside from some small tweaks for 1 and 3 I think this is probably about as good as we can get clang format. I think it is worth spending a little more time to try and get uncrustify as good as possible, as per my last comments on that PR. I think we should draw this work to a close then.
My vote is that this is good enough that the advantages of enforcing standard formatting outweigh any poor formatting decisions. I would like to postpone the final decision on tooling until we've finished the work on uncrustify because of point 2.
include/core/CMaskIterator.h
Outdated
boost::addable2< | ||
CMaskIterator<ITR>, | ||
typename std::iterator_traits<ITR>::difference_type, | ||
boost::subtractable2<CMaskIterator<ITR>, typename std::iterator_traits<ITR>::difference_type>>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with uncrustify I think this isn't great. We should probably search for uses of boost operators and disable formatting of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tom.
Experimenting with the penalty breaks was not that rewarding. I could not find values that would influence the alignment of long log messages in a manner sufficiently similar to our existing style. The only parameter of significant influence in these cases seems to be the maximum line length, hence why I bumped that to 120.
A line length of 120 is quite long I agree, but it is in accordance with that used for the elasticsearch Java coding style. I'll experiment with reducing this length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the Elasticsearch Java code uses 140 characters, which is what is visible in the GitHub PR review page without having to scroll left or right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these just shouldn't be indented but aligned. I think the only way to fix this is to disable formatting of the class Foo : boost::addable2<...
, etc. There won't be many of these, searching for header files which include boost operators should track most of them down I'd expect.
I think regarding line length, we should go for consistency in that case and leave it to match the Java code.
.addAlternative(testForPeriod) | ||
.addNested(testForWeekly) | ||
.finishedNested() | ||
.addAlternative(testForWeekly); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we preserve indentations like this in this file to make this code easier to visually parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but only by adding comment blocks to explicitly disable formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I mean, I think this is worthwhile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. I'll make the change here and scan for other such instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an automated tool that insists on reformatting everything that's going to be impossible to apply reliably.
To keep this sort of formatting we'd have to use a tool that just changes certain specific things and leaves all other code alone. Of the tools we've looked at that would imply Astyle.
I said we should give up on Astyle in #10 (comment) because it messes up lambdas and initializer lists, which clang-format and uncrustify don't. But Astyle's advantage was that it can just apply a subset of formatting changes and leave other code untouched. So another option could be to reopen #10 and investigate whether we can get Astyle to move braces, *
s and &
s without changing any indentation (which would avoid the problems with lambdas and initializer lists), or alternatively whether we can submit a fix to Astyle to indent lambdas and initializer lists correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be pragmatic though. In 99.99% of the cases I'm willing to live with clang-format's choices. This is one case, as with boost option lists and boost operators, where I think it is worthwhile having an exception. We're only talking about a handful of cases and IMO that is okay.
My main qualm with clang-format is what it does to long lines, which we can't possibly fix. That would be my reason for choosing another tool. I think uncrustify minus oddities (of which there are too many to fix by manually disabling formatting) would be pretty good, which is why I thought it was worthwhile investigating if we can work around those. If we can't then I'd probably still vote for clang-format on balance rather than have occasional very botched formatting.
So would you be against using any directives to preserve semantically meaningful formatting or to avoid really bad formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note I was only referring to this specific file: I don't think there are cases like this elsewhere in the code base.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So would you be against using any directives to preserve semantically meaningful formatting or to avoid really bad formatting?
I wouldn't mind it if it was in just a few places, say 10-20. I think if we had to use it in 100s of places though then the directives themselves hinder readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think that should be enough to cover the cases I highlighted above, and I think in general formatting is good enough that should be sufficient.
Forced manual formatting for certain sections to aid visual parsing
Setting the maximum line length to be 140 characters in order to be in agreement with the elasticsearch Java development style guide.
A consensus has been reached in favour of clang-format as our automated code formatting tool. The treatment of long LOG statements by clang-format does still leave something to be desired though. Several potential workarounds for this have been discussed but the matter will be addressed in a later PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As you said, we can follow up on the LOG_xxx
formatting in a separate PR.
Provided a script - dev-tools/clang-format.sh to reformat source code according to the agreed configuration in $CPP_SRC_HOME/.clang-format Key points to note: Java style braces Setting the maximum line length to be 140 characters in order to be in agreement with the elasticsearch Java development style guide.
Provided a script - dev-tools/clang-format.sh to reformat source code according to the agreed configuration in $CPP_SRC_HOME/.clang-format Key points to note: Java style braces Setting the maximum line length to be 140 characters in order to be in agreement with the elasticsearch Java development style guide.
[ML] Reformat native code using clang-format Provided a script - dev-tools/clang-format.sh to reformat source code according to the agreed configuration in $CPP_SRC_HOME/.clang-format Key points to note: Java style braces Setting the maximum line length to be 140 characters in order to be in agreement with the elasticsearch Java development style guide.
This changes brace placement for functions in shell scripts to match the style that was introduced for the C++ code in elastic#15.
This changes brace placement for functions in shell scripts to match the style that was introduced for the C++ code in #15.
This changes brace placement for functions in shell scripts to match the style that was introduced for the C++ code in #15.
This changes brace placement for functions in shell scripts to match the style that was introduced for the C++ code in #15.
Trial code reformatting using clang-format.
Pros:
initializers etc.
Cons:
o no way of opting in for a subset of features to format. It is
all-or-nothing.
o no way of indenting entire class headers. I've chosen to not
indent access modifiers here.
o Does horrible things to boost::program_options description lists
c.f. bin/autoconfig/CCmdLineParser.cc