Description
Background information
After @jsquyres comment on my Pull Request I have looked into the Coding Style and noticed the first comment:
NEVER use actual tab characters; always use spaces. Both emacs and vim have secret mojo that can automatically use spaces when you hit the key. This makes the code look the same in every browser, regardless of individual tab display settings.
That seemed a little bit odd to me as I was working on some files which definitely contained a funny mix of tabs and spaces.
At the time I didn't say anything about it and thought it was no big deal. Later I had a boring day and decided to grep for tabs and noticed that there are quite a few in the current code base. After mentioning this to @hkuno she mentioned it in a side note in an email to @jsquyres. This resulted in a small email thread and some accurate numbers (I was counting files not maintained in this repo) for c files from @jsquyres:
# Total number of .c/.h non-imported files
$ find . | grep -v .git/ | egrep '\.[ch]$' | egrep -v 'hwloc/|libevent/|pmix4x/|treematch/' | wc -l
3919
# How many of those have tabs
$ ack '\t' --ignore-dir=hwloc --ignore-dir=treematch --ignore-dir=libevent --ignore-dir=pmix4x --type=cc -l | wc -l
925
So ~23% of the code base has mixed indentation files (Thus violating the Coding Style). As @jsquyres agrees that this is a issue he asked me to open this issue so that the community can discuss how to handle the problem.
Solution approach
The best solution for this would be to make a big Pull request (Or four one for (ompi, orte, opal, oshmen) as requested by @jsquyres) that fixes this issue and then have a CI check running that flags PRs introducing <tab>
characters.
@jsquyres has asked me to open these PRs but I found two completely different approaches of fixing the <tab>
situation. Therefor I would like to hold off before doing that until this discussion has yielded a decision.
Possibilities
As described above, this started with just checking for the tabs in the code base, but while looking into CI tools that can check for tabs and the best way of replacing the tabs in the existing code base I noticed that it would also be possible to decide on a complete coding style. Therefore in the below possibilities I also go into clang-format
.
Only tabs
Replacing them in the existing files
Possibly the best solution would be to just replace them with a small command like:
find . \( -name '.git' -o -name 'hwloc' -o -name 'libevent' -o -name 'pmix4x' -o -name 'treematch' \) -prune -o \( \( -name '*.h' -o -name '*.c' \) -type f -exec bash -c 'expand -t 8 "$0" | sponge "$0"' \{\} \; \)
I am assuming (based on the small number of files I saw) that tabs are used instead of 8 spaces.
Checking further PRs
No for the more interesting list of possibilities on how to prevent further <Tabs>
.
@jsquyres has proposed to use Probotfor that. As I had never heard of Probot I searched for a existing bot doing what we want and found eslint-disable-probot which could be easily changed into checking for tabs.
As I only did a short search on this topic I would be glad if anyone could point out an existing tools which checks for PRs introducing <Tabs>
. This would be nice so there is no need in maintaining a custom solution to this problem.
Complete Coding Style with clang-format
Because I think that there is the possibility that there are some files using tabs instead of 4 spaces I also looked into the possibility of having a tool that can replace the existing <Tabs>
whilst being aware of the surrounding code and thus the correct indentation.
I found clang-format
:
A tool to format C/C++/Java/JavaScript/Objective-C/Protobuf/C# code.
And some wrappers around it that can be used in a CI solution and development work.
git clang-format
A clang-format integration for git.
A wrapper script around clang-format, suitable for linting multiple files and to use for continuous integration
The "problem" with clang-format
is that it seems to be not possible to just correct indentation in files as it will also enforce other coding styles aspects even when they are not specified in the style file (At least I didn't find a option for really only doing the config). This means when using clang-format
it is probably best to decide on a coding style and then to make the PRs to change the current code base to reflect the decision.
Discussion
In general it has to be decided whether to only enforce the small portion of guidelines that are pointed out in the existing Coding Style or to decide on a complete Coding Style and enforce it by tools like clang-format
. As this decision can be influenced by the tools that are used to enforce it I would be glad if you would point out other tools that can help with the problem of coding style and tabs.