Skip to content

[ML] Trial Uncrustify for reformatting source code #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

[ML] Trial Uncrustify for reformatting source code #14

wants to merge 8 commits into from

Conversation

edsavage
Copy link
Contributor

A trial of the use of uncrustify to perform minimal reformatting of
source code, primarily to change the brace style to be consistent with
that used in Java.

An initial assessment looks promising.

  • Good recognition of modern C++ features e.g. lambdas.
  • Extremely flexible configuration

There are a still a few shortcomings though. In particular the treatment of macros in switch statements. This can be alleviated by use of the INDENT-OFF, INDENT-ON comment blocks - see e.g. lib/model/ModelTypes.cc

A trial of the use of uncrustify to perform minimal reformatting of
source code, primarily to change the brace style to be consistent with
that used in Java.

This first commit contains the configuration for uncrustify and the
wrapper scripts to perform reformatting and style validation.

Reformatted source files will follow in a separate commit.
A trial of the use of uncrustify to perform minimal reformatting of
source code, primarily to change the brace style to be consistent with
that used in Java.

This commit contains the reformatted source files.
Attempt to restore alignment of variable declarations in classes.
Attempt to restore alignment of typedef declarations
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see uncrustify is better than Astyle.

It seems to have understood all the C++ language constructs we use and although some of the precise formatting is not perfect there are enough options that these can almost certainly be fixed.

So for me it now boils down to a choice between uncrustify and clang-format.


//! Bucket data waiting to be written. The map is keyed on bucket time.
//! The documents in this map will reference memory owned by
//! m_JsonPoolAllocator. (Hence this is declared after the memory pool
//! so that it's destroyed first when the destructor runs.)
TTimeBucketDataMap m_BucketDataByTime;
TTimeBucketDataMap m_BucketDataByTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a multi-line comment resets the alignment of member variable names. In this class the top 4 are aligned and the bottom 3 are, but not the one in between.

Given the number of options uncrustify has there is probably a fix for this.

maths::CBasicStatistics::mean(meanErrorAt95));
};
core_t::TTime end) {
//std::ofstream file;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indenting of the lambda here is much more extreme than clang-format.

At least uncrustify understands it's a lambda, which is more than Astyle does, and there is probably an option to indent less.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is caused by the indent_align_assign - true setting. Setting it to false gives a standard indentation of one indent for the lambda body but spoils the careful alignment seen elsewhere.

There are a few other settings that may be relevant. I'll experiment with those...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to resort to a workaround here - adding a newline after the = operator results in the lambda body being indented one 'tab'

const std::string OUTPUT_FILE("slogan1.txt");
#ifdef Windows
// Unlike Windows NT system calls, copy's command line cannot cope with
// forward slash path separators
const std::string INPUT_FILE1("testfiles\\slogan1.txt");
const std::string INPUT_FILE2("testfiles\\slogan2.txt");
const char *winDir(::getenv("windir"));
const char * winDir(::getenv("windir"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spacing here is very strange. Maybe it would be solved if we switched to putting * and & adjacent to the type. Or even if people don't want to do that, we could probably avoid it by simply using = or {} initialization rather than () initialization.

@@ -83,7 +83,7 @@ static char privatehid[] = "@(#)private.h 8.6";
#endif /* !defined HAVE_UTMPX_H */

#ifdef LOCALE_HOME
#undef LOCALE_HOME /* not to be handled by tzcode itself */
#undef LOCALE_HOME /* not to be handled by tzcode itself */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should exclude the build-setup directory from automatic formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments Dave. I think most of these concerns can be addressed by one of uncrustify's many, many options...

Increased the span size for better variable alignment.

Corrected instances where extra space had been introduced around '*'

Ignoring build-setup directory for scan.
Workaround for an instance of over zealous indentation for a lambda
body.

Collapsed empty blocks.
Workaround for lambda indentation concerns (missed from previous
commit).
@sophiec20 sophiec20 added :ml and removed :ml labels Mar 28, 2018
Copy link
Contributor

@tveasey tveasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now been through this in more detail. Although still only skimmed most of it. I'd say on the whole it is pretty good. My high level comment is as configured it is adding a lot of whitespace. Also, there are some definite oddities. Not sure if this is it misunderstanding syntax; also if these could be worked around. I'd be interested in seeing the results without the option to align variable and type names on different lines enabled, since I wonder if this is partly responsible for oddities. There are probably some tweaks to make to the configuration as well, for example it is removing just the space after an operator.

.uncrustifyrc Outdated
# Whether to alter newlines in '#define' macros.
nl_define_macro = false # false/true

# Whether to alter newlines between consecutive paren closes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace on this line.

.uncrustifyrc Outdated
# Default=True.
indent_oc_msg_prioritize_first_colon = true # false/true

# If indent_oc_block_msg and this option are on, blocks will be indented the way that Xcode does by default (from keyword if the parameter is on its own line; otherwise, from the previous indentation level).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some pretty long lines in this file. Should be tidied up before commit.

boost::program_options::options_description desc(DESCRIPTION);
desc.add_options()
("help", "Display this information and exit")
("version", "Display version information and exit")
("logProperties", boost::program_options::value<std::string>(),
"Optional logger properties file")
"Optional logger properties file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are much better than clang, which will need formatting disabling for option lists I think.

exit 1;
fi

function format()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace

uncrustify -c $CPP_SRC_HOME/.uncrustifyrc -q -f ${1}
}

function format_in_place()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace

LOG_ERROR("Unexpected number of elements " << n
<< ", expected " << N);
<< ", expected " << N);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tripped up by LOG_ statements. Is there anything we can do to keep all lines aligned with open brackets? What about disabling alignment of operator<< (we pretty much only use these in logging).

size_t *currentCol(data.get());
size_t *prevCol(currentCol + (secondLen + 1));
size_t * currentCol(data.get());
size_t * prevCol(currentCol + (secondLen + 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other odd pointer definition. Perhaps if we switch brackets to braces this would indent this correctly, but it would be nice to understand what is going wrong here.

Copy link
Contributor

@tveasey tveasey Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all honesty, I'd probably prefer not to align variable names on different lines and just always have fixed space anyway. I think the general feeling was that modern style is type* name;, type& name;, etc and should be adopted. If trying to align names across lines is also causing this sort of messed up formatting I'd double vote just to have fixed spacing.

@@ -300,15 +275,14 @@ class MATHS_EXPORT CBasicStatistics
// because then we'd have to compute epsilon and it is very
// unlikely the count will get big enough.
TCoordinate alpha{rhs.s_Count / s_Count};
TCoordinate beta{TCoordinate{1} - alpha};
TCoordinate beta{TCoordinate{1} -alpha};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rule to add space after operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per Dave's comment below, I think this is down to misinterpreting as a unary minus. I was thinking it was a configuration issue for operators in general, because it is widespread in this file, but it clearly often gets it right as well.

CVectorNx1<U, N> ci{points[i]};
bn += pow2(((ci - m).outer() - s).frobenius()) / d / z;
}
bn = std::min(bn, dn);
LOG_TRACE("m = " << mn << ", d = " << dn << ", b = " << bn);

covariances.s_Covariances = CVectorNx1<U, N>{bn / dn * mn}.diagonal()
+ (U{1} - bn / dn) * covariances.s_Covariances;
+ (U{1} -bn / dn) * covariances.s_Covariances;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not aligning to equals (everywhere else this seems to work). Is this just a bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line also highlights that uncrustify doesn't understand braces as an alternative to round brackets for constructor arguments. It's changed U{1} - bn to U{1} -bn, which means it doesn't understand that U{1} is a number that's part of the formula and then it thinks that the - is a unary minus that should be applied to bn.

{
if (weights[j] > 0.0)
{
const TDoublePoint & sample_ = CBasicStatistics::mean(sample);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the heck!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments Tom.

A few more tweaks of the configuration may address the bulk of your points but there's clearly also a few instances where Uncrustify is incorrectly understanding the syntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if this is exacerbated by trying to align consecutive names. This seems to introduce more scope for getting things wrong in terms of indenting. Since for clang-format we disabled this option I think it is worth seeing the results with this option disabled for uncrustify as well.

An attempt to mitigate some of the worst examples of Uncrustify
misunderstanding the existing code format.

The primary change was to turn of alignment of variable declarations.
@edsavage
Copy link
Contributor Author

edsavage commented Apr 3, 2018

I've had one final roll of the dice with Uncrustify. While it does address the bulk of the concerns discussed above there are still a number of instances where it simply does not have enough knowledge of the language structure to do the right thing.

It also shares the same problem as does clang-format with respect to formatting of boost operators in e..g. include/core/CMaskIterator.h

@@ -556,7 +557,7 @@ void CMultivariateMultimodalPriorTest::testSplitAndMerge(void) {
TMatrix2 mlc(modes[j].s_Prior->marginalLikelihoodCovariance());
TMatrix2 tcm = maths::CBasicStatistics::covariances(totalCovariances);
covError.add((mlc - tcm).frobenius() / tcm.frobenius());
} else {
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many cases where two extra spaces have been added before opening braces (although strangely this hasn't happened everywhere).

I can't see which config change would have caused this.

@edsavage
Copy link
Contributor Author

edsavage commented Apr 4, 2018

Closing this PR as it has been decided, after much deliberation, that clang-format has a slight edge over Uncrustify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants