Stack vs Heap, Pimpl, performance

This post is mostly about C++ but because it involves practices often used with Qt programming too, I tagged it with Qt. Also should be interesting for C++ gurus ;)

So, you know that Qt uses Pimpl (Private Implementation or Opaque pointer) which is very effective mechanism to keep binary compatibility between Qt versions. I use it very often too, but mostly not because of binary compatibility needs, but because of possibility of hiding internals and implementations especialities into cpp files.

And there is something even more important, such practice allows to avoid including 3-rd party headers from your headers – your API – you know, it can often happen that you need some “utility” api, but because of including the header it leads to bring bunch of unexpected includes coming from this “utility”. Even worse, these 3-rd party headers that you probably can not even control, bring unexpected defines, pragmas etc that can makes you puzzled for hours to understand why?, how? and in which order? it conflicts with your coding.

Using d pointer (Private Implementation) allows really avoid including these unneeded headers, but recently in huge project it makes me to think about performance issues. Usually it looks like:

// HEADER *****************
class DynObj {
public:
    DynObj();
    virtual ~DynObj();

private:
    Private * d;
};

// SOURCE *****************
class DynObj::Private {
public:
    //internals
};

DynObj::DynObj() { d = new Private; }
DynObj::~DynObj() { delete d; }

It is very effective for all advantages that I described above except one: if this is expected to be some lightweight object then when you instantiate one you actually get two memory allocation operations: one is for object itself and second one for the Private object. And if you have control for first allocation – stack or heap, you do not have control of second one – always heap.

In my case I used billions of these objects as parameters of methods, return values, local variables etc. Problem is that even if you expect that the object will be allocated in stack, still due to Private sub-object it will request “new” and “delete” so heap is involved in any case, but surely using just stack should be faster. How much? Let’s do some lab to measure.

Lab definition:

  1. Define two classes, one is just standard class using d-pointer, second should be fully allocated in stack if needed
  2. Still both should use idea of Private Implementation, so need some tricks for second class
  3. Measure time spent on allocation of objects of two types

So, we would like to have Private Implementation, but use kind of d-pointer, so when object allocation requests memory it should give enough amount of bytes with reserved space to fit Private object later. Of course we can just define quint8 d_bytes[d_size];, but compiler can not calculate d_size without knowing details of implementation of Private, but it breaks idea of closing internals into cpp only.

We can do just some tricking, we declare some magic number static const int d_size;, but in cpp we use static assert to compare this magic number with real size of Private.

Code to do such static assert is:

namespace static_assert {
template <bool> struct is_fail;
template <> struct is_fail<true> { enum { value = 1 }; };
}

template <int def,int real>
struct check_d_size : ::static_assert::is_fail<(bool)(def == real)> {};

Then in constructor we can place code:

check_d_size<d_size,sizeof(Private)>();

And on compilation we will get error like that:
../cpp-dyn-vs-stack/main.cpp: In instantiation of 'check_d_size<31, 32>':

Now easy to know right magic number.
BTW: Surely totally unportable trick! But fine for our lab and measurements.

Another issue: we have enough memory reserved in object, but we need initialize it in right way – use constructor, but with ready memory. Like that:

class StackObj {
public:
    StackObj() {
        check_d_size<d_size,sizeof(Private)>();
        d = new(d_bytes) Private;
    }
    virtual ~StackObj() { d->~Private(); }
private:
    class Private {
    public:
        inline void * operator new(size_t, quint8 * mem) { return mem; }
        ...
    };
private:
    Private * d;
    static const int d_size = 32;
    quint8 d_bytes[d_size];
};

Now if our StackObj is placed in stack then 1. appropriate memory is reserved, 2. Private sub-object is initialized in right way with all constructors of internals etc. And we still can keep implementation with internals in cpp and leave only API in header without 3-rd party includes.

Time to measures:

  • allocate 100,000 objects in stack using heap for Private: 24 msecs
  • allocate 100,000 objects in stack using stack for Private: 8 msecs

Wow! 3-times difference.
So, keep track what you can put in stack and what in heap if you count your computing time!

Full listing:

#include <iostream>
#include <string>

#include <QTime>
#include <QDebug>

using namespace std;

class DynObj {
public:
    DynObj() { d = new Private; }
    virtual ~DynObj() { delete d; }

private:
    class Private {
    public:
        int i;
        int j;
        DynObj * p;
        std::string str;
        class Check {
        public: Check() { static bool b=true; if (b) { qDebug() << "ok new dyn"; b = !b; } }
               ~Check() { static bool b=true; if (b) { qDebug() << "ok del dyn"; b = !b; } }
        } chk;
    };
    Private * d;
};

namespace static_assert {
template <bool> struct is_fail;
template <> struct is_fail<true> { enum { value = 1 }; };
}

template <int def,int real>
struct check_d_size : ::static_assert::is_fail<(bool)(def == real)> {};

class StackObj {
public:
    StackObj() {
        check_d_size<d_size,sizeof(Private)>();
        d = new(d_bytes) Private;
    }
    virtual ~StackObj() { d->~Private(); }

private:
    class Private {
    public:
        inline void * operator new(size_t, quint8 * mem) { return mem; }

        int i;
        int j;
        DynObj * p;
        std::string str;

        class Check {
        public: Check() { static bool b=true; if (b) { qDebug() << "ok new stack"; b = !b; } }
               ~Check() { static bool b=true; if (b) { qDebug() << "ok del stack"; b = !b; } }
        } chk;
    };
    Private * d;
    static const int d_size = 32;
    quint8 d_bytes[d_size];
};

int main()
{
    QTime t = QTime::currentTime();

    { DynObj ar_dyn[100000]; }

    qDebug() << "use dyn, call time, msecs" << t.msecsTo(QTime::currentTime());

    t = QTime::currentTime();

    { StackObj ar_stack[100000]; }

    qDebug() << "use stack, call time, msecs" << t.msecsTo(QTime::currentTime());

    return 0;
}

This entry was posted in Blog, C++, Qt, Research. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • Helmut Muelner

    See GotW #28: The Fast Pimpl Idiom

  • jstaniek

    Nice idea :)

    The trick as BIC as using attributes inline, right?

    BTW, I was wondering how to get automatic contant value of sizeof(Private) in a portable way. I am thinking about automated test running at build time e.g. by cmake in my case.

  • yshurik

    jstaniek: that’s possible of course, you can have some kind of pre-processing for compilation to have it in portable way. Not sure if the approach worths it, maybe for something heavy that requires a lot of tweaks. In this lab I just tried to make totally same class with d-pointer/d-array, but have full control of memory usage, and to show the danger for developers when heap is used implicitly.

  • Kuba Ober

    Yeah, and ADMIN’s approach is precisely the Attempt #3, and it is deplorable. It’s not guaranteed to reliably work even on your own platform, forget about portability. For example:

    The author of Y has to be inordinately careful with otherwise-ordinary Y functions. For example, Y must not use the default assignment operator, but must either suppress assignment or supply its own.Writing a safe Y::operator=() isn’t hard, but I’ll leave it as an exercise for the reader. Remember to account for exception safety in that and in Y::~Y(). Once you’re finished, I think you’ll agree that this is a lot more trouble than it’s worth.