Class instance initialization
It has been discovered that C++ provides a remarkable facility for concealing the trival details of a program -- such as where its bugs are.
-- David Keppel
Introduction
I thought I would ramble on a little about the class constructor in C++ as well as start a new topic in the blog, C++. Here is the first installment. A little random, but fun facts. Keep your copy of the standard close by.
Initialization order of members
Class members will be initialized in the order as they are declared in the class. This is regardless of how the member initialization list is ordered. This can bring woes if you get into the practice of using members to initialize other members (a practice that should be avoided if possible, sometimes it's not). Most modern compilers issue a warning if the initialization list doesn't match the declaration list 1. Look at the code snippet in listing 1. That will produce an undefined result in the member variable apple.
struct Basket
{
Basket( int count )
: orange(count)
, apple(orange*2)
{
}
int apple;
int orange;
};
Apple will be uninitialized since it uses orange to initialize itself, but at that point orange holds garbage. This can be catastrophic in programs, especially since looking at the code can be misleading. It looks perfectly innocent, especially if you have separated the definition of the constructor into a .cpp file and the declaration of the class itself is in a header file somewhere else.
Scope lookup rules for member initialization
Charles showed me this one day and I couldn't decide if it was the most horrible thing that snuck into the language or if it's da shit. Have you ever run into the problem of variable naming? You have several variables, all basically referencing the same thing but they have slightly different scopes. Some techniques to solve this are to postfix duplicates with an underscore or to explicitly reference things through the this pointer. But what do you do in the member initialization list? Turns out that the standards committee already covered this and the following code does work.
struct Animal
{
int bones;
int health;
Animal( int bones, int health )
: bones(bones)
, health(health)
{
}
};
Look closely, we're initializing the member variable health with the formal parameter health. Turns out that in the member initialization list, the scoping rules have an addition, the formal parameters are searched first for matches, then the normal rules kicks in. Cute, right? Maybe it's too cute for comfort. For more information, or more correct information, look at 12.6.2.7 in the standard.
Initialization of the vtable
Look at the following code snippet. What do you think will be printed if you instantiate a Node object?
struct Base
{
Base()
{
foobar();
}
virtual void foobar() = 0
{
printf( "Base::foobar()\n" );
}
};
struct Node : public Base
{
Node()
{
foobar();
}
virtual void foobar()
{
printf( "Node::foobar()\n" );
}
};
This really prints:
Base::foobar() Node::foobar()
Suprising? It really should not be. An object is constructed by first constructing inherited classes, then member variables and then finally by calling the constructor. This is done recursivly up the heirarchy. That means that while Base is constructed, the Node part of the object is not ready yet. That includes the vtable! So while we're inside the Base constructor, the vtable pointer points to the vtable represented by a Base object. In reality the compiler needs to generate two vtables and before entering the Base constructor the instance vtable pointer is set to point at vtable(base) and then before entering the Node constructor it sets the vtable pointer to vtable(node).
Note also while there is a misconception that declaring a pure virtual function inserts a zero into the vtable entry, this is not really true2. For example, even if you don't provied a definition for a pure virtual function, the compiler can insert a stub function vector to an assert handler or such since calling a pure virtual function without a definition is ... well, really bad and the behaviour is undefined (which usually means a core dump somewhere down the line).
Actually, the above code is not really good either. It should not compile at all, even though visual studio compiles it fine with /Za even. Providing a pure specifier and a definition at the same time should not be legal . Oh, well. For the sake of clarity I'll keep this version, but hey the compiler might generate any old code.
Taking this code snippet and cranking the craziness up to 11, you will get multiple (virtual) inheritance and sending in the object that's being constructed in as a parameter to the constructor and calling stuff... If you see code like that in the wild, then that person should be smacked around with a very large trout. Or just confiscate their keyboard.
In closing
Now, these are the things you can do with C++. That doesn't mean it's a good idea. The concepts discussed in this article are pretty simple, if maybe somewhat esoteric. The problem is that I've kind of presented small trivial toy examples. This quickly snowballs when you throw multiple inheritance into the mix. Some advocate that multiple inheritance is just an easy way to solve particular problems and much like you can simulate virtual tables in C by function pointers, while you can solve the problems without MI sometimes it's more convenient with it. Fine. Every tool has it's application, but before you start applying this nice new shiny hammer, think a little bit about the implications. Then think some more. And then if you're still not scared, look up 12.7.3. Don't tell me that I didn't warn you...
Footnotes
[1] Visual Studio 8 does not however.
[2] We've defined an inline pure virtual function. If that doesn't mess with your head... it messes with mine at least.
[3] See paragraph 10.4.2 in the standard. Maybe someone will fix this at Microsoft? Maybe it's related to 10.2.8 - A virtual function declared in a class shall be defined, or declared pure (10.4) in that class, or both; but no diagnostic is required (3.2). WTF? Does that mean that even if you mess up, the compiler is not obliged to tell you? That's really strange...
Resources
1. An online copy of the draft of the C++ standard (note that it's not the final one, changes were made).
Sorry this isn't directly related but, from your articles you seem to be pretty strict about coding standards type of guy and yet your examples show fields with no prefixes.
Do you have a reasoning for that? Maybe you were just making examples? Maybe you think it's a waste of time? Maybe you just don't like the esthetics? Just asking.
I've been trying to discuss that topic recently.
My current position is
struct Basket
{
int m_apples;
int m_oranges;
};
is objectively better than
struct Basket
{
int apples;
int oranges;
};
because in the middle of some member function of basic. If I see a line like this
apples -= numApplesEaten;
if I don't follow the prefix convention I have no idea if "apples" or "numApplesEaten" are members. Instead I have to look at the rest of the code which wastes my time. If instead I follow the convention then a line like this
m_apples -= numApplesEaten;
is very clear. I don't need to reference other lines or the type definition to figure out that m_apples is a member and numApplesEaten is not and I can be more efficient.
In other words, it doesn't *seem* like a style issue, it seems like an efficiency issue similar to your not ignoring warnings post. Thoughts?
Hi Greggman,
Yes, you are correct -- I am kind of a code standards type (little laugh). First let me confirm that the code is really very much written to get the point across as tersely as possible and should really be considered example code. Now with that out of the way:
I usually prefix all my private/protected member with "m_", just as you described. Contrary to a lot of dogma out there, I am not afraid of exposing public variables and these I *do not* prefix. The classes that have public variables tend to be "transport" classes anyways with few (if any) member functions. Example:
class LogEvent
{
public:
LogEvent();
const char* file;
int line;
const char* message;
const char* formattedMessage;
const char* indent;
loglevel::Type type;
};
The constructor is strictly there to ensure that the members are always initialized to something sane. When you use this in the code, it will always be apparent what is happening:
void DebugLogHandler::handleEvent( const LogEvent& event )
{
OutputDebugString( event.formattedMessage );
}
On the other hand, for regular classes I do tend to agree, all private variables are prefixed with "m_" part to make the scoping apparent (damn you implicit this pointer, I sometimes miss python's "self" convention) and part that I gotten used to the notion that it looks prettier. :)
On the whole I try to keep it fairly lightweight with the baggage you need to carry around for naming. I don't want to fall into the trap of Hungarian notation for my variables. I very much agree with rule 0 in Sutter & Alexandrescu's "C++ Coding Standards" (great book) -- "Don't sweat the small stuff".
Funny, I was looking at your GGS debug library earlier today since I saw it was in use by a friend of mine...
Cheers,
/j
Gregman wrote (it was lost in the migration between hosts):
I can see your reasoning with the refactoring and I guess I agree. I usually don't run into this myself with the public structs since I tend to write non-member functions to do stuff on them instead of writing member functions. It's a style choice I guess there, but by writing a non-member function, you are required to dereference the struct though a variable (hey, we can call it "self" :) and it's easy to see what's going on.
The whole free function thing I started a while back after reading Scott Meyer's article How Non-Member Functions Improve Encapsulation.
Hi,
I worked on a sci-fi title for a video game console a few years back. We did NOT pay attention to any of this stuff that you talk about. Coding standards? Ha. I wanna make games not worry about coding standards and check for null pointers.
-jk
Funny, I also worked on a similar title a while back and some of the guys on the team didn't pay attention to stuff like this, language etc. And wrote code that only worked on the microsoft compiler, by chance :) I had so much fun.
Hi Jim,
There is a technical niggle in the standard about defining the pure virtual function in the place you have (the standard parser won't understand it), but defining a pure virtual function per se is fine, and makes good semantic sense. You are essentially overriding the default "pure virtual function called" handler. This facility can be particularly handy when debugging a bug caused by calling virtual functions in a constructor!
It can also be used to specify both a base-class behaviour *and* a requirement that the behaviour must be made more specific. The base-class implementation is then available as explicit BaseClass::Foo(), as usual in a virtual.
BTW, isn't prefixing every member variable with m_ a form of Hungarian notation? :p I never understood the rationale for m_. I can see the names of all my locals and parameters. Any other symbol is therefore a member variable, or a global. Guess which I use more of? Therefore, I use g_ for globals; job done, and no RSI.
Eddie,
Yeah, I guess I did that in a hurry. Moving the definition out to be not inline in the declaration should be fine even on a nitpicky compiler :) I guess my quest for really compact code for the web backfired.
And m_ *is* a kind of hungarian, for sure, but the right kind of hungarian that specifies usage/context and not *type*. I like the m_ and g_ for my variables since they kind of broadcast that you are doing something potentially alias unsafe, if you spot any of them in your inner loops it's easy to hoist out to a local variable. Instead if you have to scan through all your locals inside the function, it's not trivial anymore. But yeah, the RSI will probably come in a couple of years... I more worried about my Maya use than the coding though ...
>> Providing a pure specifier and a definition at the same time should not be legal .
It's perfectly legal C++ code.