C++ takes long to compile
There is more than one reason for it but one of the reasons is excessive re-parsing of .h
header files.
In
SumatraPDF I’m using an extreme
#include
discipline to keep compilation times in check.
The rule is simple: a .h
file cannot #include
other .h
files.
I’ve been following this rule for several years in
SumatraPDF, a medium sized C++ project of over 100k loc. It works.
“It works” is more important that it seems. Many ideas seem great on paper but fail in practice. Name an economically successful communist country.
It’s not all roses: the price of minimizing compilation times is eternal vigilance.
Writing C++ while following that rule is annoying.
In code, things depend on other things. If a struct in foo.h
depends on struct in bar.h
a quick fix is to #include "bar.h"
in foo.h
.
You do it once and it works in every .c
file: you include foo.h
and it brings in bar.h
.
That convenience comes with a hidden price. Imagine you have foo2.h
that also depends on bar.h
so you also #include "bar.h"
in foo2.h
.
You then #include "foo2.h
in foo.c
and bang! You just included and parsed bar.h
twice.
In real C++ code bases the same headers are unnecessarily re-included and re-parsed hundreds of times.
It’s a known problem. We try to mitigate it with #ifdef
guards, #pragma once
etc. but in my experience those band-aids don’t solve the problem.
Following Rob Pike’s rule we must #include "bar.h"
and foo.h
and foo2.h
in foo.c
in correct order.
The “correct order” part is what makes it annoying.
Let’s face it: a month after writing foo.h
I no longer remember that it depends on bar.h
.
So the way it goes is:
- I
#include "foo.h"
in brand_new.cpp
file
- I get a compilation error
what is this Bar you're referring to?
- I dig around and figure out that
Bar
is a struct defined in bar.h
so I #include "bar.h"
before foo.h
- I get another compilation error
what is that Bar2 you speak of?
. This could be unmet dependency from foo.h
or newly included bar.h
- I keep adding 10 more
#include
to satisfy their cascading dependencies
What used to be a simple #include "foo.h"
can end up a lengthy game or #include
whack-a-mole.
So beware: following this extreme rule will be occasionally painful.
I wasn’t following this rule from the beginning. A refactor of SumatraPDF code to follow it was painful.
I find this price is worth paying and not just because of shorter compilation times.
It also forces me to design better, simpler dependencies.
Entropy is real. Complexity grows but our heads remain small.
In large programs you have hundreds of structs, classes, functions, enums and they form a complex web of dependencies.
It’s way too much to fully understand at once so we get sloppy, we take shortcuts just to get that damn thing to compile.
Over time the sloppiness accumulate and we might end up with inter-dependent, circular mess. You just want to #include "Button.h"
and somehow it ends up bringing in NuclearPowerPlant.h
I did that in my own code. Once things get tangled, it’s really hard to untangle them.
The chaos wins.
Don’t let chaos win. Be control.
I don’t think I’ve ever seen any C++ code bases that follows this rule.
This makes me either a madman or a genius.
An idea for reducing compilation times that has more awareness is impl idiom.
I’m not using it because it requires writing more code. That is not a price I’m willing to pay.