Skip to content

[ML] Reformat native code using clang-format #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 4, 2018
Merged

[ML] Reformat native code using clang-format #15

merged 9 commits into from
Apr 4, 2018

Conversation

edsavage
Copy link
Contributor

Trial code reformatting using clang-format.

Pros:

  • has intimate knowledge of modern C++ constructs e.g lambdas, braced
    initializers etc.

Cons:

  • Inflexible config, most notably.
    o no way of opting in for a subset of features to format. It is
    all-or-nothing.
    o no way of indenting entire class headers. I've chosen to not
    indent access modifiers here.
    o Does horrible things to boost::program_options description lists
    c.f. bin/autoconfig/CCmdLineParser.cc

Trial code reformatting using clang-format.

Pros:
  * has intimate knowledge of modern C++ constructs e.g lambdas, braced
    initializers etc.

Cons:
  * Inflexible config, most notably.
    o no way of opting in for a subset of features to format. It is
      all-or-nothing.
    o no way of indenting entire class headers. I've chosen to not
      indent access modifiers here.
    o Does horrible things to boost::program_options description lists
      c.f. bin/autoconfig/CCmdLineParser.cc
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to #10 this has got lambdas, multi-line template typedefs and initializer lists correct.

I only looked at a small subset of the files, but overall I think it looks quite promising providing we accept that consistency is more important than minimising the amount of change from what we have now.

It's true that the formatting of boost::program_options's long "function" chains is awful, but those are limited to one file per program, and apparently there's a simple way to tell clang-format to leave code alone so it wouldn't be too onerous to use this approach for those few sections: https://clang.llvm.org/docs/ClangFormatStyleOptions.html#disabling-formatting-on-a-piece-of-code

I think it would be nice to experiment with the "penalty" settings a little, and see if it is possible to improve the formatting of some of our longer LOG statements. I noted a couple of examples of why this should be possible in my inline comments.

return res;
}

// To set st_ino, we have to briefly open the file
HANDLE handle = CreateFile(path,
0, // Open for neither read nor write
0,// Open for neither read nor write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SpacesBeforeTrailingComments: 1 would look a bit nicer?

#define STD_INSPIRED 1
#define TM_GMTOFF tm_gmtoff
#define TM_ZONE tm_zone
#define STD_INSPIRED 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build-setup sub-directory should be excluded from formatting like 3rd_party is. It also contains 3rd party code.

.clang-format Outdated
Priority: 3
- Regex: '.*'
Priority: 1
IncludeIsMainRegex: '(Test)?$'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? For our files called XyzTest.cc the main header is also XyzTest.h, not Xyz.h.

.clang-format Outdated
IncludeCategories:
- Regex: '^"(llvm|llvm-c|clang|clang-c)/'
Priority: 2
- Regex: '^(<|"(gtest|gmock|isl|json)/)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly what these do, but the patterns look inappropriate for our codebase.

.clang-format Outdated
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth experimenting with this to see how it changes the way large LOG messages are formatted.

{
return ::dup2(fildes, fildes2);
}
int COsFileFuncs::dup2(int fildes, int fildes2) { return ::dup2(fildes, fildes2); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit weird to me in an implementation file.

Maybe we should use AllowShortFunctionsOnASingleLine: InlineOnly?

template<typename T>
inline std::string pairDebug(const T &t)
{
template <typename T> inline std::string pairDebug(const T &t) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AlwaysBreakTemplateDeclarations: true would preserve the old formatting here. Does it make other parts of the code look much worse though?

this->debugPrintRecord(dataRowFields));
LOG_ERROR("Cannot interpret " << m_TimeFieldName
<< " field in record:" << core_t::LINE_ENDING
<< this->debugPrintRecord(dataRowFields));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of why I think it would be good to experiment with the PenaltyBreakFirstLessLess setting. I don't think it looks too bad though formatted like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that inconsistent breaking in logging statements is one of the worst formatting decisions (although it also messes up readability of some maths expressions). I would say we should set this penalty as small as possible.

Also, in general some of the line breaking choices are off to my taste. It might be worth experimenting a bit with all the break penalties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed Tom - I'll experiment a little more with a few of the settings to see if I can make things a little more readable.

Were there any particular instances of 'bad' line breaking that you'd like improved if possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think the issue is that it works from the AST, so effectively eliminates all white space in the original code then has to reinsert. Although, as Dave says, this means it fully understands all language constructs it also tends to mess up any logical line breaking that was done in the original code. It is a shame there aren't more fine grained formatting controls short of having to insert don't format this guards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does this even on comments too, which renders some carefully formatted, illustrative comments unreadable, see e.g. devbin/domain_name_entropy/CIpAddressTest.cc where I've had to disable formatting around the 'TEST' comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the comment problem you could try ReflowComments: false. I imagine we'll get away with this because the other formatting changes are moving stuff to the left rather than to the right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that does do the trick of maintaining internal spacing in the comments. Thanks Dave.
I'll keep formatting disabled in this instance though to prevent line breaking.

controlMessage[0] << '\'');
LOG_WARN("Ignoring unknown control message of length "
<< controlMessage.length() << " beginning with '" << controlMessage[0]
<< '\'');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of why I think changing the PenaltyBreakFirstLessLess setting might cause other long LOG messages to be formatted more sensibly.

if (!remainder.empty())
{
core::CStringUtils::tokenise(
" ", controlMessage.substr(1, std::string::npos), tokens, remainder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is an argument for increasing PenaltyBreakBeforeFirstCallParameter? Is there a particular reason for the current value of 19?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions Dave.

I should have said that I used the output from clang-format -style=llvm -dump-config as a starting point for the configuration so its probably got a good whack of options in there that we don't need - I'll now clean them up if appropriate.

It definitely looks like its worth experimenting with the PenaltyBreakFirstLessLess setting as well.

There's a handful of place where disabling formatting entirely looks to be the only option - the program_options for one plus a few comments where internal spacing is important. I'll use it sparingly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to use:

BasedOnStyle:llvm

and then overwrite what we like to change.

What was your reason to instead dump the whole config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was simply to easily see all the settings available and their default values for the llvm style..

As we're only tweaking a few settings though it would make sense to use the BasedOnStyle approach

Certain files have key sections that are not formatting correctly.
Disable formatting using comment blocks for those.
Addressed code review comments.

Disabled formatting for a few critical sections.
Reformatted with AllowShortFunctionsOnASingleLine = InlineOnly in order
that empty bodies have broken braces in implemantation files.
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple more suggestions, but I think this is looking pretty good now for such an all-or-nothing tool.

.clang-format Outdated
ColumnLimit: 100
ConstructorInitializerAllOnOneLineOrOnePerLine: true
FixNamespaceComments: false
IndentCaseLabels: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom prefers false here, and has written more switch statements than anyone else. Also, I think using false here solves the problem of the multi-case macros (such as CASE_INDIVIDUAL_COUNT).

.clang-format Outdated
IndentWidth: 4
PenaltyBreakBeforeFirstCallParameter: 1000
PenaltyBreakFirstLessLess: 1000
PointerAlignment: Left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SpaceAfterTemplateKeyword: false would probably be more consistent with the majority of our existing template declarations (although we did have a few with a space).

Examining the behaviour of...

IndentCaseLabels: false
SpaceAfterTemplateKeyword: false
@sophiec20 sophiec20 changed the title Reformat native code using clang-format [ML] Reformat native code using clang-format Mar 15, 2018
Increased the maximum line length to 120 characters.

Adjusted the settings...

PenaltyBreakBeforeFirstCallParameter: 4
PenaltyBreakFirstLessLess: 2

to obtain a consistent and reasonably pleasing layout of long log
messages.
@sophiec20 sophiec20 added :ml and removed :ml labels Mar 28, 2018
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole I think this is pretty good and better than #14 on balance. I think the biggest issues I now have with this, aside from some point cases where we can disable formatting, are:

  1. Formatting of log statements. Where did you get to experimenting with PenaltyBreakFirstLessLess Ed? Another option would be to add "" << at the start of all log statements so it at least gets the alignment sort of ok. (We should be able to script this change at the same time, unless people have strong objections.)
  2. Its reflowing of long lines. There are lots of cases where you want to break up long lines based on semantics, but it has no way of knowing how to do this so tends to insert line breaks in odd places. This is the one big disadvantage of this versus uncrustify IMO and there is no way of controlling this behaviour.
  3. This configuration has a tendency to produce some very long lines, is ColumnLimit 120 a bit high?

Aside from some small tweaks for 1 and 3 I think this is probably about as good as we can get clang format. I think it is worth spending a little more time to try and get uncrustify as good as possible, as per my last comments on that PR. I think we should draw this work to a close then.

My vote is that this is good enough that the advantages of enforcing standard formatting outweigh any poor formatting decisions. I would like to postpone the final decision on tooling until we've finished the work on uncrustify because of point 2.

boost::addable2<
CMaskIterator<ITR>,
typename std::iterator_traits<ITR>::difference_type,
boost::subtractable2<CMaskIterator<ITR>, typename std::iterator_traits<ITR>::difference_type>>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with uncrustify I think this isn't great. We should probably search for uses of boost operators and disable formatting of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tom.

Experimenting with the penalty breaks was not that rewarding. I could not find values that would influence the alignment of long log messages in a manner sufficiently similar to our existing style. The only parameter of significant influence in these cases seems to be the maximum line length, hence why I bumped that to 120.

A line length of 120 is quite long I agree, but it is in accordance with that used for the elasticsearch Java coding style. I'll experiment with reducing this length.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the Elasticsearch Java code uses 140 characters, which is what is visible in the GitHub PR review page without having to scroll left or right.

Copy link
Contributor

@tveasey tveasey Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these just shouldn't be indented but aligned. I think the only way to fix this is to disable formatting of the class Foo : boost::addable2<..., etc. There won't be many of these, searching for header files which include boost operators should track most of them down I'd expect.

I think regarding line length, we should go for consistency in that case and leave it to match the Java code.

.addAlternative(testForPeriod)
.addNested(testForWeekly)
.finishedNested()
.addAlternative(testForWeekly);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we preserve indentations like this in this file to make this code easier to visually parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only by adding comment blocks to explicitly disable formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I mean, I think this is worthwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I'll make the change here and scan for other such instances.

Copy link
Contributor

@droberts195 droberts195 Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an automated tool that insists on reformatting everything that's going to be impossible to apply reliably.

To keep this sort of formatting we'd have to use a tool that just changes certain specific things and leaves all other code alone. Of the tools we've looked at that would imply Astyle.

I said we should give up on Astyle in #10 (comment) because it messes up lambdas and initializer lists, which clang-format and uncrustify don't. But Astyle's advantage was that it can just apply a subset of formatting changes and leave other code untouched. So another option could be to reopen #10 and investigate whether we can get Astyle to move braces, *s and &s without changing any indentation (which would avoid the problems with lambdas and initializer lists), or alternatively whether we can submit a fix to Astyle to indent lambdas and initializer lists correctly.

Copy link
Contributor

@tveasey tveasey Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can be pragmatic though. In 99.99% of the cases I'm willing to live with clang-format's choices. This is one case, as with boost option lists and boost operators, where I think it is worthwhile having an exception. We're only talking about a handful of cases and IMO that is okay.

My main qualm with clang-format is what it does to long lines, which we can't possibly fix. That would be my reason for choosing another tool. I think uncrustify minus oddities (of which there are too many to fix by manually disabling formatting) would be pretty good, which is why I thought it was worthwhile investigating if we can work around those. If we can't then I'd probably still vote for clang-format on balance rather than have occasional very botched formatting.

So would you be against using any directives to preserve semantically meaningful formatting or to avoid really bad formatting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note I was only referring to this specific file: I don't think there are cases like this elsewhere in the code base.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would you be against using any directives to preserve semantically meaningful formatting or to avoid really bad formatting?

I wouldn't mind it if it was in just a few places, say 10-20. I think if we had to use it in 100s of places though then the directives themselves hinder readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think that should be enough to cover the cases I highlighted above, and I think in general formatting is good enough that should be sufficient.

edsavage added 2 commits April 3, 2018 12:01
Forced manual formatting for certain sections to aid visual parsing
Setting the maximum line length to be 140 characters in order to be in
agreement with the elasticsearch Java development style guide.
@edsavage
Copy link
Contributor Author

edsavage commented Apr 4, 2018

A consensus has been reached in favour of clang-format as our automated code formatting tool.

The treatment of long LOG statements by clang-format does still leave something to be desired though. Several potential workarounds for this have been discussed but the matter will be addressed in a later PR.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

As you said, we can follow up on the LOG_xxx formatting in a separate PR.

@edsavage edsavage merged commit 2580b4f into elastic:master Apr 4, 2018
edsavage added a commit that referenced this pull request Apr 4, 2018
Provided a script - dev-tools/clang-format.sh to reformat source code
according to the agreed configuration in $CPP_SRC_HOME/.clang-format

Key points to note:

Java style braces

Setting the maximum line length to be 140 characters in order to be in
agreement with the elasticsearch Java development style guide.
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
Provided a script - dev-tools/clang-format.sh to reformat source code
according to the agreed configuration in $CPP_SRC_HOME/.clang-format

Key points to note:

Java style braces

Setting the maximum line length to be 140 characters in order to be in
agreement with the elasticsearch Java development style guide.
droberts195 pushed a commit that referenced this pull request Apr 23, 2018
[ML] Reformat native code using clang-format

Provided a script - dev-tools/clang-format.sh to reformat source code according to the agreed configuration in $CPP_SRC_HOME/.clang-format

Key points to note:

Java style braces

Setting the maximum line length to be 140 characters in order to be in
agreement with the elasticsearch Java development style guide.
droberts195 added a commit to droberts195/ml-cpp that referenced this pull request May 2, 2018
This changes brace placement for functions in shell scripts to
match the style that was introduced for the C++ code in elastic#15.
droberts195 added a commit that referenced this pull request May 2, 2018
This changes brace placement for functions in shell scripts to
match the style that was introduced for the C++ code in #15.
droberts195 added a commit that referenced this pull request May 2, 2018
This changes brace placement for functions in shell scripts to
match the style that was introduced for the C++ code in #15.
droberts195 added a commit that referenced this pull request May 2, 2018
This changes brace placement for functions in shell scripts to
match the style that was introduced for the C++ code in #15.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants