Nobody Understands C++: Part 5: Template Code Bloat

On occasion you will read or hear someone talking about C++ templates causing code bloat. I was thinking about it the other day and thought to myself, "self, if the code does exactly the same thing then the compiled code cannot really be any bigger, can it?"

Here are the test cases presented, first without the use of templates, second with the use of templates. Exactly the same functionality and exactly the same code output:

#include <iostream>

void print(int i)
{
  std::cout << i << std::endl;
}

void print(const std::string &s)
{
  std::cout << s << std::endl;
}

void print(double d)
{
  std::cout << d << std::endl;
}

void print(bool b)
{
  std::cout << b << std::endl;
}

int main()
{
  print(1);
  //Note, I have to put it in a std::string() otherwise the compiler thinks it's a const char *
  //which gets converted to an int or bool or something
  print(std::string("hello world"));
  print(4.5);
  print(false);
}

And with the use of templates:

#include <iostream>

template<typename T>
void print(const T &t)
{
  std::cout << t << std::endl;
}

int main()
{
  print(1);
  print(std::string("hello world"));
  print(4.5);
  print(false);
}

There is no question that the templated version is smaller, easier to maintain and easier to grok than the first version (assuming a basic understanding of templates).

They both produce exactly the same output:

1
hello world
4.5
0

And what about compiled code size? Each were compiled with the command g++ <filename>.cpp -O3. Non-template version: 8140 bytes, template version: 8028 bytes! The compiled size of the templated version was smaller. In the interest of full disclosure, with anything compiled less than -O3, the template version is 20 bytes larger than the non-template version.

Also, build times do not vary between the two versions. Each takes approximately .623 seconds to compile.

So, what are we seeing here, really? Templates do not "cause code bloat" or long compile times. The fact is, if one were to write the same exact code with both templates and non-template versions the compile times and resulting code size would probably be very close. However, the program sources for the non-template versions would be so insurmountably large they would likely be unmaintainable.

For example, if one were to use a std::vector<std::string>, std::vector<int> and std::vector<std::vector<float> > in his code, the resulting compiled code would be large indeed, as the compiler would generate no less than 4 versions of vector for him. However, if he were to hand write string_vector, int_vector, float_vector and float_vector_vector the amount of code to maintain would be huge. Bugs found in string_vector would more than likely not get fixed in int_vector, and the compiled code would almost assuredly be the same size as the standard template versions.

Comments

Careful, people normally mean that if you use std::vector in two separate class files that get compiled separately and then linked together, you end up compiling all of the functions for std::vector twice. In a large project with hundreds of separate class files, this can really add up to a lot of bloat and kill you when you try to link the whole thing together.

That's not necessarily true. With g++ at least, the compiler is able to merge the linked template instantiations.

This comment is true in that the code is compiled once for each instantiation thus increasing compile time. On the other hand Jason is right pointing out that the linker is responsible for eliminating redundant code. If you list the symbols in all compilation units you will see that all template generated code is marked as a weak symbol implying that it will be removed if an identical normal defined (or weak) symbol, without errors (this makes it One-Definition-Rule so important, when the linker eliminates symbols it depends on the symbols being exactly the same).

Going back to compilation time, C++ standard explicitly states that only instantiated member functions are compiled, so it might be the case that the code that is compiled is smaller in the templated version (if some of the methods of the class are never called), which can reduce both compilation time and final binary size. If you only use std::vector::push_back, std::vector::begin and std::vector::end you will never compile std::vector::operator[], std::vector::insert...

While what you've said is correct, there's another factor involved. People who know and love templates tend to avoid old C-style generic programming, such as used in the qsort() routine commonly found on UNIX systems. Such old code worked on arbitrary arguments by taking arguments from the caller specifying void*s to values, sizes in bytes if necessary, and function pointers to operate meaningfully on the memory content. It was not type safe, but it did mean the one block of compiled code could be reused on arbitrary types. Compared to this, instantiating templates for myriad types does generate a lot of executable code. On the other hand, that code might run faster as inlining and other optimisations are possible. Similarly, doing things like "template " where A and B might be the sizeof a couple strings being passed to the constructor can very quickly lead to hundreds of copies of the template as A and B vary. Often, any optimisations that allows aren't worth the bloat. Sometimes, clever programmers will actually use a lightweight template to provide type safety, for example: template class X { typedef map Map; } where E is expected to be an enum, but it's not too painful to convert the values to ints for private Maps.... Cheers, Tony

HTML crap chopped a bit of my post:

  template <int A, int B> struct S { S(const char a[A], const char b[B]); }
  // is it worth the bloat?

  template <typename Enum> class X { typedef map<int, string> Map; ... }

Templates have a lot of positives, I don't think fast compiles time is
really one of them. Sure it may compile faster the first time (as you compile
only the members you need) but since all template code has to live in
header files (yes there is export, but it never seems to do the job). When
code is being tweaked/developed/enhanced this can lead to a lot of
compilation pain.

When people mention template bloat they sometimes meant to compare the extra initiative put forth into using templates versus some other lightweight type-unsafe method (c anonymous functions vs templates). In which case templates are most definitely bloat, but safe code comes at a cost.

The template version is small because you're passing a reference (sort of a pointer). What happens if you do the following?

template<typename T>
void print(const T t) // Passes the value not its address
{
  std::cout << t << std::endl;
}

I should also say that blocks of executable code are often aligned to 4KB or 512byte boundaries, meaning the difference in size between these two programs might not be enough to push the actual size of the output over one of these boundaries. You might have to make 100 versions of the same code (e.g. printed from another program) to see the real effect.

I believe extern templates introduced by C++11 will eventually solve the problem of compilation time, or template re-compilation.

But before that happens:

  • template source files must be split into:
    • template declarations
    • template definitions
  • code that uses the templates must be split into:
    • template instantiations files
    • template usage (references to template functions/methods)

In this approach the template definitions are still header files in the practical sense, but they are only meant to be included in the template instantiations source files. The template instantiations sources will likely include other headers from throughout the project, in order to bring in types used as template arguments through the project.

That might include splitting the standard templates library headers in this way, too, although much of the code there is inline with or without the template aspect.

This approach has been known (and used?) before C++11 as "explicit template instantiation" model, but it required specific compiler options to make it work.

Unfortunately this approach does not work with local types used as template arguments, becase the local types would need to be made available in the separate source file that deals with template instantiations only.