Wednesday, September 3, 2014

Order of Construction in C++ for Polymorphic Classes

I came across a bit of a bug (or at least I thought it was) recently while programming. I thought I'd share the experience because it's a bit of a nuance of the C++ language and doesn't seem to fit what we normally expect C++ to do.

The issue was with polymorphism. I was creating a new class that derived from a base class. Part of the construction of the base class is to give it a pointer to it's owner. Upon construction, the base class would then call into the owner class, passing the this keyword to it. What I expected was to have all the vtable information of the derived class available to me. This is a key element to polymorphism that makes it such an incredible feature. However, when Owner began using the Derived object I had created (as a Base*), I found that it was calling only Base functions, not overloaded Derived functions.

Surely, I thought, I've discovered a bug in the compiler or something.

And then I decided to dig a bit deeper and try it out in a very controlled environment. This is the test I came up with.

#include <iostream>
using namespace std;

class Base
{
public:
    Base()
    {
        bool isDerived = GetIsDerived();
        if (isDerived)
        {
            cout << "Class is derived.";
        }
        else
        {
            cout << "Class is base.";
        }
    }
    virtual bool GetIsDerived() const { return false; }
};

class Derived : public Base
{
public:
    Derived() : Base() 
    {
        bool isDerived = GetIsDerived();
        if (isDerived)
        {
            cout << "Class is derived.";
        }
        else
        {
            cout << "Class is base.";
        }
    }
    virtual bool GetIsDerived() const { return true; }
};

int main(int argc, char** argV)
{
    Base base;

    Derived derived;
}


I have a Base class and a Derived class that extends Base. I then have a virtual function in Base called GetIsDerived(). This function always returns false, because in Base, that's true. In the Derived class, however, I overload it to always return true. In my entry point function, I create a new instance of Base on the stack. Sure enough, as you would expect, the first line printed is "Class is base." When I create the new Derived object on that stack, you'd think that the overloaded GetIsDerived() would run, but when within the Base constructor itself no vtables have been created yet, and as such, no overloaded functions will get callled. The total output is then this:

Class is base.
Class is base.
Class is derived.

So you see, when putting functionality in the constructor, rather than mere initialization, be careful what your functions are doing because until the object is fully constructed whatever is using it will treat it like the base and not like the derived class. The problem is even more difficult to see when you're giving someone else your this keyword and they begin using your pointer as the base, not the derived.

Wednesday, August 27, 2014

Programming as a communication to future programmers (including yourself)

I was recently in a discussion with one of my colleagues about some of the finer points of developing in C++. To be frank, we talk about this nearly every day, but this conversation in particular stuck out to me.

The particular thing we were talking about was how the code you write can clue in future developers into how something ought to be used or how something works. In C++, this is pretty important because C++ allows developers to accomplish tasks in a variety of ways. The particular example we were discussing was when to pass by reference and when to pass by pointer.

Surely, there have been thousands of discussions on whether to pass by reference or pointer. Just google it and you'll see the evidence of that. However, instead, we took the approach of "When should I accept by reference or by pointer?" Moreover, what does this say about how I want to use the references or pointers you are giving me?

I phrase the question this way because the user of a function doesn't get to choose how the function's signature is created -- the used function chooses that. By creating your function in a particular way, you are communicating how it will be used by user code. As an example, when thinking about passing by reference or by pointer, the two have rules in the language that permit certain things and restrict others. Therefore, depending on how you accept an object, you're communicating to the user how you expect to be able to use it.

If you are accepting by pointer, you are saying that
  1. It is okay for the object to not exist, ie, it's optional. Pointers can be null, so only accept a pointer if you have a handle-case for the object to not exist. In fact, it's probably best to put all pointer arguments at the end of the function signature with default " = nullptr" values in each one, making the function more convenient to use.
  2. The object, if not null, will at least live as long as the function's scope. Once you leave the function's scope, the object could at any time be deleted. This means that the function shouldn't store off the pointer somewhere else for later use. Of course, this much isn't even guaranteed because...
  3. The function can delete the pointed-to object. This is often my biggest fear when using other people's code and passing objects to them by pointer. Does the function expect to take over ownership? Will the function delete my object? Of course, some functions explicitely are intended to do that and even say so in their name.

As opposed to accepting a reference where...
  1. The object is guaranteed to exist.1 This is great because then there's no null-checking, the user knows that the object must exist, and if they attempt to dereference a pointer in order to pass it by reference, the dereferencing null exception happens on their turf, not mine -- it's their mistake, not a bug in my API. However, like a pointer, there is no guarantee that the object will exist longer than the scope of the function, so don't store it off or even the object's address off anywhere.
  2. The object will live for the entire scope of the function and you are guaranteed to have access to it for the entire duration of the function. Whereas with a pointer, I could re-assign it to point to a different object, with a reference I can't reassign it. Moreover, this means that...
  3. A reference cannot be deleted (with exceptions). Try calling delete on a reference. It doesn't work. Of course, you could call "delete &reference;", but let's be honest, if you're doing that, you have no business being a developer at all.

And there's more about communicating to your user how you will use objects passed to them. Using the const keyword is a great way to promise to your user code that you won't modify the object passed to them (unless you cast it to a non-cost, at which point somebody needs to cut you... deep... and in a main artery).

Part of me feels dumb writing this. Above, I said that this was a finer point of developing in C++, but honestly it's a pretty blunt topic. This isn't an obscure practice. C++ developers have known these points since the dawn of the language. Heck, compilers even auto-generate copy constructors that take a const reference. Why? Because they never want to copy null and they want to guarantee that the copy constructor won't modify the original object.

Yet, I see time after time after time, code written that doesn't follow these simple rules, and it's often by developers who've been at it for well over a decade. Instead of guaranteeing the existence of an object by asking for it by reference, they'll ask for it by pointer and put an assert at the very top of the function. That will prevent shit from hitting the fan in debug, but when the shit hits the fan in release, you're shit outa luck and will likely have a harder time finding the bug because release dumps never give you enough information (optimizations that make the callstack not match your code, only getting stack memory, etc).

In my mind, it all comes down to subtle communication and usability. I'm a tools programmer, so I think a lot about usability. What many programmers don't think about is that the first tool is the code itself. The usability of the code also needs to be considered in detail and that includes what it communicates to users of the code.

A comparison might explain it best. If you have a property of an object in a tool you're writing and that property can be one of a list of values that do not relate to each other in scale, the UI representation you'd likely create for that is a combobox. Why? Because there are three discrete options and you can only pick one. But what if you create a slider? Well a slider can snap to 0, 33, 66, and 100% to represent the four values and only one can be selected. But a slider communicates a relationship between the possible values. A slider is, therefore, confusing.

Good uses of sliders: Graphic quality (scaling from low to high), view distance (scaling from near to far), field of view (scaling from narrow to wide).

Good uses of comboboxes: Physics Interpolation (none, interpolate, extrapolate), a character's profession (warrior, mage, ranger, priest), light type (point, spot, direction, area).

If you were to use a slider for the fields that lend themselves to comboboxes, the user could get confused. Wait, is a priest the best profession because it's highest on the slider? Is a point light not as good as a directional light because it's lower on the slider? Of course not, but when you present a slider to something that fits a combobox, the user will get confused and be unsure of how to use the tool.

This is what it's like when seeing a function call that is asking for an object in the wrong way. Possibly worst of all, it slows down future development time as confused programmers stumble their way through learning your API.

Write so as to communicate. When you write, think, "what am I telling future developers about my code?" and use what is most appropriate.


1 While a reference is not absolutely guaranteed to exist, it is at least guaranteed to never be assigned to null statically. Because we're talking about programmer's static typing here, for all intents and purposes, a reference says "I am expecting this object to exist".

Wednesday, April 23, 2014

Code Dump: C++ delegate

When I started to learn to program, a lot of stuff seemed like magic. If you're a programmer, you probably know the feeling. It's just like when you are told how to do something, rather than getting an explanation of how it works under the hood.

For me, that happened all the time using C# delegates. See, I really liked C# delegates. I thought they were the shit (and what all the pros used, don't ya know). Turns out they aren't good in every situation (most situations call for a more simple and sufficient event pattern) but regardless, they can be very useful at certain times.

Dumb name is dumb

I've done some researching about how to start blogs. One thing that comes up a lot is to make sure you nail the blog name.

So I sat in front of my the "Create a blog" page for a few minutes, realizing that I feel like I'm naming a character in an MMO (you MMO gamers know what I'm talking about).

Then I realized that I can change the name at any time. So here's my blog, with a dumb (yet accurate, which the programmer in me is a fan of) name.

It's not much now, but hopefully that will change. You can expect to see posts about stuff I'm interested in (like the choreography of 1's and 0's) and from time to time, even my own original thoughts (coming from an American, this is pretty impressive), so come back again and follow me and all that stuff. You never know, I may actually entertain some of you.

Note: While I am a programmer for Sony Online Entertainment, you won't find any information regarding their games here. My thoughts and everything written on this blog are my own.