Sunday, August 28, 2016

The Data You Care About

I recently watched a video about why Object Oriented Programming is Bad and later tweeted a bit about it and quickly realize twitter wasn't the right platform to adequately describing my thoughts about a particular part of the video.

So here's my thoughts better laid out. Keep in mind, this is only in reference to what he says at 33:30 into the video -- not the entire video.

Basically the point he's making (and that I want to expand on) is that methods and the idea of encapsulation that they support are not always bad. He says that when the method is tightly related to the data of the class, it's appropriate. A common example are ADTs -- Arrays, Lists, HashLists and other generic containers and constructs.

The question I want to elaborate on is "Why do methods work out okay for ADTs but not other classes?" and "How can we use that to inform how we write other stuff?"

Here's the answer up front: Bookkeeping.

If you think about data (literally, the variables you declare) always keep in mind which ones are Data You Care About and which ones are data that do bookkeeping for the Data You Care About. Let's open up an array to see what I mean.

template<typename T>
class Array
{
   T* m_pArray;     // The actual array of data.
   int m_Count;     // How many elements are in the array.
   int m_Space;     // How much space has been allocated for the array.
public:
    // Constructors, accessors, etc.
}

If it's not obviously, only one of the members of this sample Array class are the actual data we care about -- the other two are just bookkeeping. The key thing here is that within the class, the various methods are manipulating m_Space and m_Count in order to keep track of how much memory has been allocated and how much memory has been initialized. If these were publicly exposed, anybody could write to these methods and screw up the classes accounting of data. In reality, if you know the internal structure of the array class, you could do some casting and pointer arithmetic to manipulate these values anyway. But that's not a big deal because something like that obviously looks like a very unsafe thing to do and you have to go out of your way to do so. Whereas array.m_Count = 10; looks like a perfectly normal line of code and you'd have to evaluate the surrounding code before realizing it was bad.

Like I mentioned in my tweets about this, this is more common than just ADTs, and in fact when you start separating data in your head into "real data" and "bookkeeping" you'll design your classes with much more clarity. Take a typical 3D transform class.

class Transform
{
   Vector3 m_Position;                            // Local Position
   Quaternion m_Orientation;                      // Local Orientation
   Vector3 m_Scale;                               // Local Scale
   mutable Matrix4x4 m_WorldMatrix;
   mutable bool m_WorldMatrixDirty = false;
public:
    // Constructors, accessors, etc.
}

Let's assume that m_Position is initialized as a (0,0,0), m_Orientation is an unrotated quaternion, and m_Scale is (1,1,1). The world matrix is initialized as an identity matrix. Maybe Vector3 is padded to 4 floats instead of 3 or any number of better decisions than how this class is laid out. The point is, there is Data You Care About and bookkeeping (AKA overhead).

This example is a bit deceiving because in the end, we probably only care about m_WorldMatrix. The local Position Orientation and Scale (POS) is likely just used so that when this transform's parent changed, we still have our data relative to the parent and can easily reconstruct our world matrix, which is used for rendering, physics, and many other systems. Note that this class isn't currently describing who the parent is -- could be bound by pointer or ID or something. It doesn't matter for the example being shown.

The obvious bookkeeper is m_WorldMatrixDirty. It's especially notable because it's got the mutable keyword. That means I can modify it within a const method. It makes this possible:

const Matrix4x4& Transform::GetWorldMatrix() const
{
    if(m_WorldMatrixDirty)
    {
        //recalculate world matrix.
        m_WorldMatrixDirty = false;
    }
    return m_WorldMatrix;
}

Now, we are only calculating the world matrix when it actually needs to be calculated. But to the outside world, they have no idea we're doing this trick. However, just as importantly, we don't want people directly writing to our local POS, because when the local POS changes it invalidates our world matrix. So we write something like this:

void Transform::SetLocalPosition(const Vector3& newPosition)
{
    m_Position = newPosition;
    m_WorldMatrixDirty = true;
}

This is almost like Event-Driven Programming1. We'd guarded the access to our data member because we want to make sure we do something when that value changes. You can even imagine delegates and events used to notify other parts of the code when data has changed and they want to react to those changes. Here's some easy examples:

  • In a game, when something is added or removed from your inventory:
    • The UI wants to know so it can update your inventory window.
    • The chat/info box wants to know so they can show a message (ie, "Removed Steel Sword").
    • If it is a networked or online game, a system that replicates data may want to notify your client that the item was added or removed.
  • In a game, when your health changes:
    • Enemy AI may want to prioritize their targets. If you have low enough HP, maybe they just want to finish you off.
    • Friendly AI may want to prioritize their healing or protective abilities.
    • The UI will want to show the health change.
    • The game may want to make your character grunt from the hit if it was large enough.
    • In a networked game, a replication system needs to notify nearby clients of the change.
  • In a level editor, when you make a change to the heightfield:
    • The terrain mesh will need to be rebaked for rendering, collision, pathfinding, etc.
    • Placed objects may want to move with the terrain as it is being deformed.
    • A terrain texturing system may want to change the texture based on height, slope, or any other number of properties of the new mesh.
    • Flora may want to regenerate -- maybe the grass only grows on flat terrain and not hills. It wants to know if you just made a steep hill.


Or the UI reacts to a change in stats.


I could go on and on with examples.

The big takeaway is that none of this automatic bookkeeping could be done without methods (or at least some form of indicating which functions were allowed access to data members). Methods certainly are useful and maybe even more commonly useful than the author of the video is letting on. I'm not saying he's wrong -- just that it's slightly more nuanced than the video describes. And maybe that's just a result of only having so much time in a video to explain things. Only he'd really be able to comment on that.

Next post I'll talk about who might want to be listening to these events and what type of data those objects are likely to have (hint, it's not Data You Care About, and that's okay).




1 Full disclaimer -- event driven programming has it's faults too -- it certainly shouldn't be used everywhere.