Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compile time #194

Open
ncihnegn opened this issue Jan 17, 2019 · 22 comments
Open

Improve compile time #194

ncihnegn opened this issue Jan 17, 2019 · 22 comments
Milestone

Comments

@ncihnegn
Copy link
Contributor

For the same options, CLI11 triples my compile time of boost::program_options.

@henryiii
Copy link
Collaborator

boost::program_options is not a header-only library - it is pre-compiled. The benefit of not having to pre-compile the library comes at a compile time cost, as with all header only libraries. However, you can mitigate that by making sure the CLI11 code only is in one cpp file, and keeping your components separate. There's no need to include CLI11 in any other file - you can just use plain C++ types! That's part of the design. PyBind11 has some excellent examples of how to keep headers leaking into other files, for maximum compile-time performance.

However, CLI11 still only takes a couple of seconds to compile, so do you have a very simple application? It should not grow as your application becomes more complex.

@henryiii
Copy link
Collaborator

Technically, CLI11's current design would allow it to have an optional precompiled mode - I think a (very) few other libraries support a dual usage mode. However, it may become more heavily templated in 2.0, with Option<T> instead of the current Option. If that happens, it will need to remain header only (since templates cannot be precompiled).

@henryiii
Copy link
Collaborator

To support precompilation, all headers would need to be split into two, a declaration and an implementation, then there would be a way to precompile the implementation files optionally.

Personally, though, if I have to do it, I'll wait till both the redesign, and until C++ Modules become finalized, probably C++20, hopefully not C++23.

@ncihnegn
Copy link
Contributor Author

I agree. But it will be nice if there is a macro like FAST_COMPILE to exclude some codes.

@henryiii
Copy link
Collaborator

I wonder if something like this would help: https://quuxplusone.github.io/blog/2019/01/06/hyper-function/ - CLI11 uses std::function everywhere. Would have to be re-implemented or permission to use it in a BSD software would be needed, but would be interesting to test.

There really aren't many optional codes in CLI11. The only one is Timer, and you don't get it unless you include it. In general, you should only includes CLI11 is a limited set of files, probably only your main file.

@henryiii
Copy link
Collaborator

I'm planning to move the extra transforms and any other less heavily used optional portions to a separate file, that will close this issue when done.

@henryiii henryiii added this to the v1.8 milestone Mar 12, 2019
@jlisee
Copy link

jlisee commented Apr 12, 2019

Dual mode is definitely the way to go here. Take a look at fmt for a library so good that it basically went into the standard library. It has an expressive API but the core is not templates so users still have a very productive compile time cycle. For example do you really need to support every type or could Option<T> really just have a variant? And maybe you have an optional template API that allows more user flexibility.

The slow compile time of this library is a big reason we are considering dropping it from our project or doing a re-write/fork so we can become productive again. It just makes no sense to wait 10+ seconds for a simple command line application to compile.

@henryiii
Copy link
Collaborator

henryiii commented Apr 12, 2019

10 seconds doesn't seem that bad just for one command line program; you should only be including and using CLI11 from one file only. The entire test suite and examples, with 38 executables, takes 2.8 minutes to compile, or 4.4 seconds each (Dual core machine, however).

The library actually avoids templates for the most part. Only option additions are templates; all types are non-templated (CLI::app instead of CLI::app<>, CLI::option, etc.). It would be fairly trivial to separate the files into ".hpp" and ".inl" files, and then optionally compile the ".inl" files - almost everything could be precompiled. It would have to be done in such a way, however, that normal users of the header only library could continue to use it. I believe 90% of the library could be pre-compiled with the current API, maybe more.

If you are serious about rewriting/forking, I'd happily help with a PR to do that in the way I described above for 1.9 or 2.0. A CMake user should get the precompiling for free, and users of header only without CMake should not even notice.

@henryiii henryiii modified the milestones: v1.8, v1.9 Apr 28, 2019
@henryiii
Copy link
Collaborator

Note: Including boost::optional (I think) seems to change a ~3-4 minute total compile into a 20 minute compile on CI! This should be investigated, and made non-automatic if it can't be fixed. I think Boost::Optional pulls in a lot of legacy pre-C++11 headers from Boost. @jlisee, could you retry with CLI11_BOOST_OPTIONAL explicitly defined to 0? You might see 4x or more compile time improvement, if I'm right.

@henryiii
Copy link
Collaborator

I'm looking at the Azure build times, for reference. The GCC 9 docker build takes 2.5 minutes. The GCC 4.7 docker build takes 1.5 minutes. The native builds also do the single header mode, but on macOS and Linux it just doubles the build time (since it builds twice, in effect). However, the native Linux build (only change I can think of is Boost and Boost::Optional) takes 10-20 minutes!

@henryiii henryiii mentioned this issue May 20, 2019
3 tasks
@oschonrock
Copy link

+1 on this being an obstacle. It's really nice but 8s fixed overhead. dual mode/precompile would be fab

@oschonrock
Copy link

oh, and +200k on binary size?? Only using 2 options.

@oschonrock
Copy link

PCH gains about 1s

@oschonrock
Copy link

oschonrock commented Dec 25, 2019

I don't want to offend your library, which is really good, but these practicalities matter. I just tried:
https://github.com/p-ranav/argparse

and It adds only 1.3s compile time overhead and 35k to the binary. That is 5-6 times less on both factors than CLI11 .

That's on -O2 or -O3 (similar really) on clang8

@phlptp
Copy link
Collaborator

phlptp commented Dec 25, 2019

I have wondered in the past if there might be design space for cli11_lite header only option that really stripped down the files and only included what was necessary to handle a few simple options types but used the same syntax and calls. Then you could use that if you just had a few simple options and use the full version if you needed the type flexibility, subcommands, callbacks, validators, and the other things that make the library really powerful but do add to the compile time.

@oschonrock
Copy link

oschonrock commented Dec 25, 2019

Yes. I have no need for those complex options. Others will, which is why CLI11 is so good. I have not checked what exact subset argparse supports.

Perhaps that's why it is important to find ways to make CLI11 both

  • compile quickly
  • and link to a small binary.

Pre compiled lib (dual mode?) seems the obvious choice (unless you wait for c++20 modules which is not reasonable for a C++11 lib).

And carefully chosen link modules (I am not an expert here) can solve binary size. "Just" need to ensure that the lib compiles down to a "cli11_core.o" and a "cli11_advanced.o" so that the linker can eliminate any bloat for simple use cases?

If you can manage both of those, then perhaps a "branched model" (which is always harder to maintain) is unnecessary?

PS: I actually found that if you compile CLI11 with -O0 then it is much faster than the above figures. But of course the binary is even bigger. Seems like clang8 thinks it is "hard work to optimise it" . There is just quite a lot of code at the end of the day, I think.

@henryiii
Copy link
Collaborator

For those interested in what the compiler is actually doing, here's a flame graph:

simple.cpp.json.zip

Download and extract, then open https://www.speedscope.app and click browse, then load that file.

Preview (not very useful without being able to click on the items to see what is being instantiated or parsed):

Screen Shot 2019-12-30 at 4 27 17 PM

@henryiii
Copy link
Collaborator

henryiii commented Dec 30, 2019

I do believe a large portion of the instantiation, and a few of the headers, could be compiled into a library and removed from the standard build time in pre-compiled mode. Dropping our deprecated features should help - IsMember is being triggered by it, I believe. We have a rather large number of stdlib headers that probably can be trimmed a bit:

// Standard combined includes:

#include <algorithm>
#include <cmath>
#include <deque>
#include <exception>
#include <fstream>
#include <functional>
#include <iomanip>
#include <iostream>
#include <istream>
#include <iterator>
#include <locale>
#include <map>
#include <memory>
#include <numeric>
#include <set>
#include <sstream>
#include <stdexcept>
#include <string>
#include <sys/stat.h>
#include <sys/types.h>
#include <tuple>
#include <type_traits>
#include <utility>
#include <vector>

We could pull a few things out of the header only single file version, and require an include for them. For example, validators take a large portion of the time - IsMember alone takes about 0.5 seconds. For an example that doesn't use it.

@oschonrock
Copy link

The 80:20 rule is likely to apply.

Make 20 of the "low hanging fruit" changes to get 80% of the available compile time improvement benefits.

@henryiii
Copy link
Collaborator

I don't know how to get rid of AsNumberWithUnit being instantiated, but dropping the deprecated features in non-templated methods gave back the 0.5 seconds used on IsMember. Much of the time (maybe not 80%, but quite a bit) is in setting up all the std::functions, which I think could be precompiled away.

@henryiii henryiii modified the milestones: v1.9, v2.0 Dec 31, 2019
@henryiii henryiii mentioned this issue Dec 31, 2019
5 tasks
@brian-finisher
Copy link

While I love the API of CLI11, the compile time hit is remarkably bad (gcc 7.5.0 in an arm64 Docker Container). Even with ninja, the build stage appears to hang for nearly a minute - or sometimes even two. If I remove the CLI11 include(s), my binary takes mere seconds to compile.

This is probably going to force me to choose a different arg parsing library, unfortunately.

@henryiii
Copy link
Collaborator

I think we could solve much of it by allowing a pre-compilation mode, it just hasn't happened yet. I think most of the cost is in compiling the lambda functions, which could be done in advance.

Depending on how you structure you app, you can split out the parsing into a separate file that doesn't need to recompile often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants