About smart pointers in C++

About smart pointers in С++

An application can use different resources – memory, handles, network connections and other stuff that is better to free after usage. C++ is a flexible language that allows a C++ developer to do a lot but makes him care about many things, including freeing resources. This article is dedicated to smart pointers.

Well, what can you do with resources?

To allocate but not to free

You may just live resources alone: as soon as a process exits, OS frees resources itself. It might look incorrect, but this approach can be used for the resources that are involved all the lifetime of a process.

It quite often occurs to memory – static variables and singletons are common things. It’s such a good deal – any time a program can call a singleton and check it. A singleton is always here.

Freeing manually

Like in good old times – if one had opened a file it was necessary to close it. Objects’ lifetime can be long and non-trivial because of the multithreading and exceptions. It’s hard not to make a mistake.

Reference counting

The idea is to count references to an object and once the count sets to zero, you free the object. There is a standard std::shared_ptr class in C++, below we will examine it.

Any problems? Yes, some. If object A has a reference to object B and vice a versa (circular references), then the reference counts of both A and B never set to zero, and thus A and B never get free .

Despite the primitiveness and the non-universality, this approach is widely used in languages like Python and frameworks like .Net due to the garbage collection.

However, in C++ you can use standard class std::weak_ptr to avoid circular references problem. Let’s talk about std::weak_ptr a bit later.

Garbage collection

On the one hand, a program doesn’t free objects explicitly, that makes a code easier,  particulary, when one deals with tree structures, lists, and, of course, lock-free structures.

On the other hand, a developer can rely on the garbage collection only in case of the memory freeing: if, for example, a file is opened, it’s much better not to wait when the garbage collector decides to close the handle, but to close the handle explicitly.

С++: std::unique_ptr

In C++ it’s easy to create an object on the stack:

struct DebugObject {
    int m_i;

    DebugObject(int i) : m_i(i) {
        std::cout << "DebugObject(" << m_i << ") created\n";
    }

    virtual ~DebugObject() {
        std::cout << "DebugObject(" << m_i << ") deleted\n";
    }
};

void f() {
    DebugObject p(1);
}

Once f() is finished, p goes out of scope and it gets destroyed.

Unfortunately, there are some obstacles. An object be destroyed right before exiting from f(), this can’t be changed. If you want to return the objects from a function, or to save it, a copy of the object is created.

It’s ok for lightweight objects like a number pair but imagine an array with billion items – you can’t do it this way. A stack is limited, copying large objects again and again takes a lot of time.

Nevertheless, destruction of an object on a stack when it goes out of scope is really a powerful feature of C++. Even if an exception is thrown in f(), the object p gets destroyed anyway and its destructor is called. Imagine a wrapper class that is created on a stack, which stores an object pointer (allocated from a heap), and destroys the object when “wrapper” goes out of scope. This wrapper class is std::unique_ptr.

With the help of std::unique_ptr you can write:

void f() {
        // It's correct in с++11
        std::unique_ptr<DebugObject> p1(new DebugObject(1));
        // make_unique is available in с++14
        auto p2 = std::make_unique<DebugObject>(2);
}

In this code all objects are freed correctly, and it doesn’t matter what might happen during the execution.

Custom deleter

It would be strange if there were no way to specify a custom function to delete an object:

void customDelete(DebugObject *p) {
     std::cout << "CustomDelete()\n";
     delete p;
}

void f() {
    std::unique_ptr<DebugObject, void(*)(DebugObject*) > p(new DebugObject(3), customDelete);

    p.reset(nullptr);
    p.reset(new DebugObject(4));

    std::cout << "raw pointer size:" << sizeof (DebugObject*) << "\n";
    std::cout << "unique_ptr with deleter size:" << sizeof (p) << "\n";
}

It’s important to mention a reset method: it overwrites an old pointer with a new pointer. The deleter function is called for an old pointer if it’s not null.

The custom deleter function std::unique_ptr let free not only objects, but any kind of resources, e.g. handles.

An example of a closing handle:

struct FILEDeleter {
    void operator()(FILE *pFile) {
        if (pFile)
            fclose(pFile);
    }
};

using FILE_unique_ptr = std::unique_ptr<FILE, FILEDeleter>;

FILE_unique_ptr make_fopen(const char* fname, const char* mode) {
    FILE *fileHandle = fopen(fname, mode);
    // fileHandle will be nullptr if error
    return FILE_unique_ptr(fileHandle);
}

void f() {
    std::cout << "unique_ptr with stateless struct as deleter: " << sizeof (FILE_unique_ptr) << "\n";
}

Notice that FILEDeleter is stateless: FILE_unique_ptr stores just one pointer of a file.

Returning from a function

std::unique_ptr can be returned from a function. Due to move semantics std::unique_ptr transfers object owning, set pointer to null, and the object itself is not destroyed on exit from makeObj(). The object will be destroyed on exit from function f():

std::unique_ptr<DebugObject> makeObj(int n) {
    return std::make_unique<DebugObject>(n);
}

void f() {
    std::unique_ptr<DebugObject> p = std::move(makeObj(5));
    std::unique_ptr<DebugObject> p2 = makeObj(6);
    std::unique_ptr<DebugObject> p3(makeObj(7));

    auto p4 = std::move(p);
    // auto p5 = p; //compilation error!
}

Get raw pointer

void f() {
    auto p = std::make_unique<DebugObject>(8);

    std::cout << "DebugObject.m_i == " << p->m_i << "\n";
    DebugObject *inner_p = p.get();
}

It is simple. It is worth mentioning that p continues storing a pointer, and DebugObject will be deleted on exit. In case you don’t want to destroy an object, use the method release. As in case of usual pointers, it’s possible to access to fields and methods via operator ->.

void f() {
    DebugObject *raw_ptr = nullptr;
    {
        auto p = std::make_unique<DebugObject>(9);
        raw_ptr = p.release();
    }
    std::cout << "Object alive yet\n";
    delete raw_ptr;
}

Casting to bool

It’s quite handy that std::unique_ptr is similar to raw pointer – one can cast it to bool (if it’s nullptr, the result is false; otherwise it is true):

auto p = std::make_unique<DebugObject>(10);

if (p) {
    std::cout << "p isn't null\n";
}

Object owning

To make things simple, it’s a good idea to decide who owns an object, for example:

class Owner {
private:
    DebugObject *m_p;
    // DebugObject is an abstract class, size is unknown
public:

    Owner(DebugObject *p) : m_p(p) {
    };

    ~Owner() {
        delete m_p;
    }
};

Well, let’s rewrite the code using std::unique_ptr:

class Owner {
private:
    std::unique_ptr<DebugObject> m_p;
public:

    Owner(DebugObject *p) : m_p(p) {
    };
};

The code is much simplier now because std::unique_ptr is stored by value, not by a pointer, the destructor Owner::~Owner() calls the destructor of m_p.

A few words about object owning

In Rust there are so-called linear types: https://en.wikipedia.org/wiki/Substructural_type_system. Such types can be used only once. Speaking about object owning, if someone owns an object, it’s possible to transfer ownership to another owner only once .

Unfortunately, there is no such a feature in C++. One can make two unique pointers of the same object:

void f() {
    auto* p = new DebugObject(11);

    Owner o1(p);
    Owner o2(p);
    // bad way
}

In this case one object is destroyed twice which is wrong, of course. To avoid such thing it’s better to wrap a pointer by std::unique_ptr not at an arbitrary place, but at the start position instead.

class Owner2 {
private:
    std::unique_ptr<DebugObject> m_p;
public:

    Owner2(std::unique_ptr<DebugObject> &&p) : m_p(std::move(p)) {
        std::cout << "I'm owner of " << (m_p ? "object" : "nullptr") << "\n";
    };
};

void f() {
    auto p = std::make_unique<DebugObject>(12);

    Owner2 realOwner(std::move(p));
    Owner2 fakeOwner(std::move(p));
    // at least it doesn't crash
}

In С++11 rvalue references and move semantics has been introduced.

std::vector<int> a{1, 2, 3};
std::vector<int> b = a;
a.push_back(4);

assert((b == std::vector<int>{1, 2, 3}));
assert((a == std::vector<int>{1, 2, 3, 4}));

For example, standard std::vector stores pointers of the array beginning, end and the count.

The content of the vector a (i.e. the whole array of data inside it) will be copied, and b will contain another pointers. But if vector a is not planned to be used later, we can name it rvalue reference and, instead of making a deep copy, just copy the pointers. Actually, the poitners are not just copied, but replaced between a and b.

To convert an object to rvalue reference explicitly you can write:

std::vector<int> c{1, 2, 3};
std::vector<int> d = std::move(c);

// don't use objects after they are moved
c.push_back(4);

assert((d == std::vector<int>{1, 2, 3}));
assert((c == std::vector<int>{4}));

So, at the second line values of empty vector b and non empty vector a are exchanged. a has been rvalue and it has been changed by assignment.

What are we talking about? Alike, std::unique_ptr transfers owning being assigned, and we can’t make two similar std::unique_ptr-s from a single one: one of them will contain nullptr.

In the sample above fakeOwner will contain nullptr and realOwner will be the only owner.

Of course, it would be great if C++ prohibited the usage of p after the first std::move(p), but, unfortunately, C++ doesn’t support it (yet?).

Well, at least we can’t delete an object twice, and a developer suspects that something is going wrong when seeing at std::move(p).

A small excursus on Rust: the idea of owning an object helps to implement state machine – for example, it’s possible to have class “closed file” with the only method “open” and the class “opened file” with the methods for reading and closing. And the method “open” will take the ownership of closed file and return opened file, and vice a versa. This approach allows to use resources with more number of states than usual “resource is allocated, resource is freed”.

C++: std::shared_ptr

As mentioned above, std::shared_ptr is a wrapper of a pointer with the reference count.

void f() {
    std::shared_ptr<DebugObject> ptr1(new DebugObject(0));

    std::cout << ptr1.use_count() << "\n"; 
    // 1
    std::cout << ptr1.unique() << "\n"; 
    // true

    {
        auto ptr2 = ptr1;
        std::cout << ptr1.use_count() << "," << ptr2.use_count() << "\n";
        // 2, 2
        std::cout << ptr1.unique() << "," << ptr2.unique() << "\n";
        // false, false
    }

    std::cout << ptr1.use_count() << "\n";
    // 1
    std::cout << ptr1.unique() << "\n";
    // true
}

The difference between std::shared_ptr and std::unique_ptr is that std::shared_ptr is heavier: it stores also a pointer to an additional object, a control block, that contains the reference count (count of std::shared_ptr instances that points to the same object)

When the reference count become zero, the object gets destroyed. The reference count is changed in a thread-safe manner, that’s why working with std::shared_ptr is slower than with raw pointers and std::unique_ptr.

std::make_shared

std::shared_ptr<DebugObject> p1(new DebugObject(0));
auto p2 = std::make_shared<DebugObject>(1);

The second approach is more efficient. Why? In the first case memory is allocated twice (two pieces) – for DebugObject and for a control block.

std::make_shared immediately allocates single piece of memory that contains both an object and a control block. The disadvantage is that it’s possible to free this memory for both the object and the control block.

Wait, why it’s necessary to keep control block alive after destroying the object? The thing is that a control block also contains a count of weak references.

std::weak_ptr

void f() {
    std::weak_ptr<DebugObject> w;
    {
        std::shared_ptr<DebugObject> p(new DebugObject(2));
        w = p;
        std::cout << w.use_count() << "," << p.use_count() << "\n";
        // 1, 1
        std::cout << (w.expired() ? "true" : "false") << "\n";
        // false
    }
    std::cout << (w.expired() ? "true" : "false") << "\n";
    // true
    std::cout << w.use_count() << "\n";
    // 0
}

In the sample above the reference count has become zero and the object is destroyed. But the count of weak references is not 0 and the control block still exists (it will be deleted only when the count of weak reference becomes zero).

The object pointer is hidden inside std::weak_ptr and can’t be accessed from the outside. To get it one must call the method lock – it returns std::shared_ptr and the object can’t be destroyed suddenly (if it has been destroyed already, the std::shared_ptr with nullptr will be returned).

std::weak_ptr can be used to avoid circular references – for example, А contains std::shared_ptr of B, Bstd::weak_ptr of A.

Thread safety

  • std::unique_ptr is not thread safe. If one calls e.g. std::move in the two threads at the same time, one can get two pointers of the same object. However, std::unique_ptr is fast.
  • std::shared_ptr – a reference count in a control block is incremented and decremented thread-safely. But there is one thing: std::weak_ptr contains two pointers (of the object and of the control block), and reading it in one thread, while someone is writing it in another thread, is not a good idea. On the other hand, one can read std::shared_ptr from several threads at the same time. std::shared_ptr guarantees that, when all owners are destroyed, the object is destroyed as well.

For reading/writing std::shared_ptr from different threads use synchronization mechanism: mutexes and others. Also there are standard helpers like std::atomic_load, std::atomic_store and std::atomic_exchange.

These helpers use blockings, and std::atomic_exchange is atomic only from the points of view of others “atomic_*” functions. If one of them is used, it’s better to use them everywhere.

About possible leaks

In complex expressions the order of execution is not defined:

func(std::shared_ptr<Lhs>(new Lhs("foo")), std::shared_ptr<Rhs>(new Rhs("bar")));

It’s possible, that Lhs(“foo”) has just been created but shared_ptr(..) hasn’t been called yet, and creation of a new Rhs(“bar”) will throw an exception. So Lhs(“foo”) will not be deleted as a smart pointer hasn’t been created yet.

If using std::make_shared(“foo”) everything will be fine:

func(std::make_shared<Lhs>("foo"), std::make_shared<Rhs>("bar"));

Another variant is not to write a long expression but split it onto three parts – creation pointers of Lhs, Rhs and call of func(…).
In this case smart pointer of Lhs is created earlier than constructor of Rhs is called. The same for std::unique_ptr.

Read more about that here: https://stackoverflow.com/questions/20895648/difference-in-make-shared-and-normal-shared-ptr-in-c

Conslusion

The usage of std::unique_ptr simplifies a code and reduces the possibility of memory leaks. This kind of pointers is very lightweight – its size is equal to the raw pointer’s size (In case when the custom deleter is used – the size is equal to the size of two raw pointers). You can use it wherever the only owner of an object is supposed.

If the reference count is required, one can use std::shared_ptr along with std::weak_ptr to avoid circular references problems.