Extreme #include discipline for C++ code
Apr 12 2022
C++ takes long to compile
There is more than one reason for it but one of the reasons is excessive re-parsing of .h header files.
In SumatraPDF I’m using an extreme #include discipline to keep compilation times in check.
The rule is simple: a .h file cannot #include other .h files.
I didn’t come up with this idea, I got it from Rob Pike: http://doc.cat-v.org/bell_labs/pikestyle
I’ve been following this rule for several years in SumatraPDF, a medium sized C++ project of over 100k loc. It works.
“It works” is more important that it seems. Many ideas seem great on paper but fail in practice. Name an economically successful communist country.
It’s not all roses: the price of minimizing compilation times is eternal vigilance.
Writing C++ while following that rule is annoying.
In code, things depend on other things. If a struct in foo.h depends on struct in bar.h a quick fix is to #include "bar.h" in foo.h.
You do it once and it works in every .c file: you include foo.h and it brings in bar.h.
That convenience comes with a hidden price. Imagine you have foo2.h that also depends on bar.h so you also #include "bar.h" in foo2.h.
You then #include "foo2.h in foo.c and bang! You just included and parsed bar.h twice.
In real C++ code bases the same headers are unnecessarily re-included and re-parsed hundreds of times.
It’s a known problem. We try to mitigate it with #ifdef guards, #pragma once etc. but in my experience those band-aids don’t solve the problem.
Following Rob Pike’s rule we must #include "bar.h" and foo.h and foo2.h in foo.c in correct order.
The “correct order” part is what makes it annoying.
Let’s face it: a month after writing foo.h I no longer remember that it depends on bar.h.
So the way it goes is:
What used to be a simple #include "foo.h" can end up a lengthy game or #include whack-a-mole.
So beware: following this extreme rule will be occasionally painful.
I wasn’t following this rule from the beginning. A refactor of SumatraPDF code to follow it was painful.
I find this price is worth paying and not just because of shorter compilation times.
It also forces me to design better, simpler dependencies.
Entropy is real. Complexity grows but our heads remain small.
In large programs you have hundreds of structs, classes, functions, enums and they form a complex web of dependencies.
It’s way too much to fully understand at once so we get sloppy, we take shortcuts just to get that damn thing to compile.
Over time the sloppiness accumulate and we might end up with inter-dependent, circular mess. You just want to #include "Button.h" and somehow it ends up bringing in NuclearPowerPlant.h
I did that in my own code. Once things get tangled, it’s really hard to untangle them.
The chaos wins.
Don’t let chaos win. Be control.
I don’t think I’ve ever seen any C++ code bases that follows this rule.
This makes me either a madman or a genius.
An idea for reducing compilation times that has more awareness is impl idiom.
I’m not using it because it requires writing more code. That is not a price I’m willing to pay.
programming c++ SumatraPDF

Feedback about page:

Optional: your email if you want me to get back to you: