It has been discovered that C++ provides a remarkable facility for concealing the trival details of a program -- such as where its bugs are.
-- David Keppel

Introduction

I thought I would ramble on a little about the class constructor in C++ as well as start a new topic in the blog, C++. Here is the first installment. A little random, but fun facts. Keep your copy of the standard close by.

Initialization order of members

Class members will be initialized in the order as they are declared in the class. This is regardless of how the member initialization list is ordered. This can bring woes if you get into the practice of using members to initialize other members (a practice that should be avoided if possible, sometimes it's not). Most modern compilers issue a warning if the initialization list doesn't match the declaration list 1. Look at the code snippet in listing 1. That will produce an undefined result in the member variable apple.

 
struct Basket
{
	Basket( int count )
		: orange(count)
		, apple(orange*2)
	{
	}
	
	int apple;
	int orange;
};
Listing 1: Member initializing list

Apple will be uninitialized since it uses orange to initialize itself, but at that point orange holds garbage. This can be catastrophic in programs, especially since looking at the code can be misleading. It looks perfectly innocent, especially if you have separated the definition of the constructor into a .cpp file and the declaration of the class itself is in a header file somewhere else.

Scope lookup rules for member initialization

Charles showed me this one day and I couldn't decide if it was the most horrible thing that snuck into the language or if it's da shit. Have you ever run into the problem of variable naming? You have several variables, all basically referencing the same thing but they have slightly different scopes. Some techniques to solve this are to postfix duplicates with an underscore or to explicitly reference things through the this pointer. But what do you do in the member initialization list? Turns out that the standards committee already covered this and the following code does work.

 
struct Animal
{
	int bones;
	int health;
	
	Animal( int bones, int health )
		: bones(bones)
		, health(health)
	{
	}
};
Listing 2: Member initializer scope lookup

Look closely, we're initializing the member variable health with the formal parameter health. Turns out that in the member initialization list, the scoping rules have an addition, the formal parameters are searched first for matches, then the normal rules kicks in. Cute, right? Maybe it's too cute for comfort. For more information, or more correct information, look at 12.6.2.7 in the standard.

Initialization of the vtable

Look at the following code snippet. What do you think will be printed if you instantiate a Node object?

 
struct Base
{
	Base()
	{
		foobar();
	}
	
	virtual void foobar() = 0
	{
		printf( "Base::foobar()\n" );
	}
};

struct Node : public Base
{
	Node()
	{
		foobar();
	}
	
	virtual void foobar()
	{
		printf( "Node::foobar()\n" );
	}
};
Listing 3: Virtual table during construction.

This really prints:

 
Base::foobar()
Node::foobar()

Suprising? It really should not be. An object is constructed by first constructing inherited classes, then member variables and then finally by calling the constructor. This is done recursivly up the heirarchy. That means that while Base is constructed, the Node part of the object is not ready yet. That includes the vtable! So while we're inside the Base constructor, the vtable pointer points to the vtable represented by a Base object. In reality the compiler needs to generate two vtables and before entering the Base constructor the instance vtable pointer is set to point at vtable(base) and then before entering the Node constructor it sets the vtable pointer to vtable(node).

Note also while there is a misconception that declaring a pure virtual function inserts a zero into the vtable entry, this is not really true2. For example, even if you don't provied a definition for a pure virtual function, the compiler can insert a stub function vector to an assert handler or such since calling a pure virtual function without a definition is ... well, really bad and the behaviour is undefined (which usually means a core dump somewhere down the line).

Actually, the above code is not really good either. It should not compile at all, even though visual studio compiles it fine with /Za even. Providing a pure specifier and a definition at the same time should not be legal . Oh, well. For the sake of clarity I'll keep this version, but hey the compiler might generate any old code.

Taking this code snippet and cranking the craziness up to 11, you will get multiple (virtual) inheritance and sending in the object that's being constructed in as a parameter to the constructor and calling stuff... If you see code like that in the wild, then that person should be smacked around with a very large trout. Or just confiscate their keyboard.

In closing

Now, these are the things you can do with C++. That doesn't mean it's a good idea. The concepts discussed in this article are pretty simple, if maybe somewhat esoteric. The problem is that I've kind of presented small trivial toy examples. This quickly snowballs when you throw multiple inheritance into the mix. Some advocate that multiple inheritance is just an easy way to solve particular problems and much like you can simulate virtual tables in C by function pointers, while you can solve the problems without MI sometimes it's more convenient with it. Fine. Every tool has it's application, but before you start applying this nice new shiny hammer, think a little bit about the implications. Then think some more. And then if you're still not scared, look up 12.7.3. Don't tell me that I didn't warn you...

Footnotes

[1] Visual Studio 8 does not however.
[2] We've defined an inline pure virtual function. If that doesn't mess with your head... it messes with mine at least.
[3] See paragraph 10.4.2 in the standard. Maybe someone will fix this at Microsoft? Maybe it's related to 10.2.8 - A virtual function declared in a class shall be defined, or declared pure (10.4) in that class, or both; but no diagnostic is required (3.2). WTF? Does that mean that even if you mess up, the compiler is not obliged to tell you? That's really strange...

Resources

1. An online copy of the draft of the C++ standard (note that it's not the final one, changes were made).

Comments