Dumb Pointers and Smart Pointers

pointers
 
c++
 
gcc
 
assembly
 
boost
 

Gone are the days when you need match every malloc(...) with free(ptr), match every new with delete and check whether we created an array or a single element. Those practices are unfriendly to maintainers and readers, and also handle exceptions poorly. Come are the days of smart pointers.

Raw Pointer

Before the dawn of smart pointers, we had to rely on raw pointers to dynamic manage memory resource. Inherited from C, raw pointers could use [m/c/re]alloc(...) library functions to dynamically create a valid memory for a raw pointer. These memory needs to be manually returned to the OS with another library function free(ptr):

// custom type for testing
struct MyInt{
  MyInt() : value(8) {std::cout << "ctor\n";}
  int value,value2,value3;
};

MyInt *ptr = malloc(8 * sizeof *ptr); // allocate an int array of size 8
if (!ptr) return;                     // 'malloc' returns NULL on failure
memset(ptr, 0, 5 * sizeof *ptr);      // 'malloc' does not itialize
/*
 * do some work with ptr
 */
free(ptr);

With the prevalence of OOP in C++, it was helpful to initialize the objects after dynamic memory allocation. new-expression (some places refer it as new operator, not to be confused with operator new) does exactly that.

// overloading operator new. new-expression calls operator-new and then the constructor.
// We can define operator-new in global name space to create custom allocators, or speci-
// fically for a class. Implementation of operator new is platform dependant, but it is
// not uncommon to see it use malloc (and throw std::bad_alloc when malloc fails).
void* operator new(std::size_t count) {
  std::cout << "op new\n";
  return malloc(sizeof(MyInt));
}
try {
  // below prints 'op new', 'ctor'
  MyInt *ptr2 = new MyInt();
  // below prints 'op new', 'ctor'*5
  MyInt *ptr3 = new MyInt[5]/*{optional ctor arg list}*/;
} catch (std::bad_alloc& e) {
  return;
}
/*
 * do some work with ptrs
 */
delete ptr2;
delete [] ptr3;

So what’s bad about them?

  1. It requires the developer to correctly pair the allocations and returns of the memory. It looks simple on the surface, but it could get complicated very quickly with the increase number of logic branches and complexity of the code;
  2. They are not exception safe.

Smart Pointers

Smart pointers to the rescue. We have a terrific tool to solve these two problems: destructor. Destructor gets called when the object goes out of the scope. By returning the memory owned by the object to the OS in destructor, we can get away with manually returning them. It also plays well with exception as destructor are called when exceptions are thrown.

Smart pointers are defined in header <memory>, so it is required to have #include<memory> to use smart pointers. On extreme cases, e.g. when std::vector size is impactful and considered too wasteful, we can have smart pointers to arrays too:

auto p = std::make_unique<int[]>(5);
// x86-64 gcc 10.1
std::cout << sizeof(std::unique_ptr<int[]>) << std::endl; // 24
std::cout << sizeof(std::vector<std::unique_ptr<int>>);   //  8

Smart pointers have some common interfaces:

  • get(): get the raw pointer to the object. Raw pointers might be required for old interfaces. It is also worth mentioning that using raw pointers are fine - as long as we don’t use raw pointers as resource-owning pointers.
  • reset(ptr): replace resource with ptr;
  • swap(smart_ptr): swap ownership with smart_ptr;
  • release(): release resource (std::unique_ptr and std::auto_ptr).

Smart pointers also support some commonly used raw pointer semantics, e.g. * and -> for dereferencing and [] for smart pointers of arrays (in the example above, we can do p[2]).

Auto Pointer

Before anything, std::auto_ptr is deprecated in C++11 and removed from C++17. The ‘auto’ in its name suggests automatic return of resource upon destruction of object. Auto pointer was introduced to solve the above mentioned problems with raw pointers.

// backward/auto_ptr.h
template<typename _Tp> class auto_ptr
{
 /* other implementation details */
private:
  _Tp* _M_ptr;
public:
  /** @brief  An %auto_ptr is usually constructed from a raw pointer.
   *  @param  __p  A pointer (defaults to NULL).
   *  This object now @e owns the object pointed to by @a __p. */
  explicit
  auto_ptr(element_type* __p = 0) throw() : _M_ptr(__p) {}
  /** @brief  An %auto_ptr can be constructed from another %auto_ptr.
   *  @param  __a  Another %auto_ptr of the same type.
   *  This object now @e owns the object previously owned by @a __a,
   *  which has given up ownership. */
  auto_ptr(auto_ptr& __a) throw() : // auto_ptr == auto_ptr<_Tp> in this scope
      _M_ptr(__a.release()) { }     // copy constructor 'strips' resource from its source
  /** @brief  %auto_ptr assignment operator.
   *  @param  __a  Another %auto_ptr of a different but related type.
   *  A pointer-to-Tp1 must be convertible to a pointer-to-Tp/element_type.
   *  This object now @e owns the object previously owned by @a __a,
   *  which has given up ownership.  The object that this one @e
   *  used to own and track has been deleted. */
  template<typename _Tp1>
  auto_ptr& operator=(auto_ptr<_Tp1>& __a) throw() {
    reset(__a.release()); // similarly for '=', resource is 'stripped' from its source
    return *this;
  }
  /** When the %auto_ptr goes out of scope, the object it owns is
   *  deleted.  If it no longer owns anything (i.e., @c get() is
   *  @c NULL), then this has no effect. */
  ~auto_ptr() { delete _M_ptr; }

};

Being in backward directory is quite telling of its current status. We can see the issue of std::auto_ptr straight from its source code: it (silently) releases the resource of its source on copy constructor or assignment operation. In standards’ language, it is not std::is_copy_constructible, also not std::is_assignable (both have strict requirement of leaving their source unchanged).

These two violations make std::auto_ptr non-conformant to STL container requirements. However, common implementations work relatively well with std::auto_ptr:

std::vector<std::auto_ptr<int>> autoV1;
/* initialize autoV1 */
std::sort(autoV1.begin(), autoV1.end(),
    [](const std::auto_ptr<int> &lhs, const std::auto_ptr<int> &rhs){
      return *lhs < *rhs;
    });

The code above does not pose risk of memory leaking (in gcc 10.1 at the very least). After all, the only case that does cause issues with std::auto_ptr is when itself is copied, but most of the standard library container operations are done on iterators. When elements in the containers are copied, they are also tend to work with std::auto_ptr. For example:

_Tp __tmp = _GLIBCXX_MOVE(__a);
__a = _GLIBCXX_MOVE(__b);
__b = _GLIBCXX_MOVE(__tmp);

Either tmp objects are created (example above) or the source object is no longer required after the operation (therefore stealing its content is fine). In the cases where the object itself is copied and required to be valid after operation, compilation is forbidden by enforcing constness on the source object (therefore avoiding compilation when the object is of std::auto_ptr type). For example:

// compilation error:
// no matching function for call to 'std::auto_ptr<int>::auto_ptr(const std::auto_ptr<int>&)'
std::merge(autoV1.begin(), autoV1.end(), autoV2.begin(), autoV2.end(),
    std::back_inserter(autoRes),
    [](const std::auto_ptr<int> &lhs, const std::auto_ptr<int> &rhs){
      return *lhs < *rhs;
    });

Compilers are relatively lenient on std::auto_ptr. For example, gcc 7.5 still allows std::auto_ptr with -std=c++17 where in Standards it is removed.

Compilers do an amazing job on allowing std::auto_ptr when no errors are expected and giving compilation errors when std::auto_ptr leads to erroneous behavior, even though Standards dictates that it containers can outright reject std::auto_ptr. But we should not put ourselves in these cases by avoiding std::auto_ptr altogether. A Standard-conformant can very well result into unexpected behaviors: item 8 of Effective STL gives an example of this.

Unique Pointer

std::unique_ptr was introduced in C++11 to solve this problem with std::auto_ptr. It might be more accurate to say that the introduction of move operations in C++11 provided a solution to fix std::auto_ptr, and this solution is named std::unique_ptr.

Prior to that, Boost provided boost::scoped_ptr (also not compatible with STL containers) to avoid resource getting implicitly/secretly ‘stolen’. boost::scoped_ptr had copy constructor and assignment operator removed from its implementation by inheriting boost::noncopyable.

Implementation

Let’s have a peek at std::unique_ptr’s implementation:

// bits/unique_ptr.h
template <typename _Tp, typename _Dp = default_delete<_Tp>>
class unique_ptr
{
  /* other implementation details */
public:
  /** @param __p  A pointer to an object of @c element_type
   *  Takes ownership of a pointer. The deleter will be value-initialized. */
  template <typename _Up = _Dp, typename = _DeleterConstraint<_Up>>
  explicit
  unique_ptr(pointer __p) noexcept : _M_t(__p) {} // similar to auto_ptr
  /* Move constructor. */
  unique_ptr(unique_ptr&& __u) noexcept
  : _M_t(__u.release(), std::forward<deleter_type>(__u.get_deleter())) { }
  /* Destructor, invokes the deleter if the stored pointer is not null. */
  ~unique_ptr() noexcept {
    auto& __ptr = _M_t._M_ptr();
    if (__ptr != nullptr)
      get_deleter()(__ptr);
    __ptr = pointer();
  }
  /** @brief Move assignment operator.
  * @param __u  The object to transfer ownership from.
  * Invokes the deleter first if this object owns a pointer. */
  unique_ptr& operator=(unique_ptr&& __u) noexcept {
    reset(__u.release());
    get_deleter() = std::forward<deleter_type>(__u.get_deleter());
    return *this;
  }
  /* Disable copy from lvalue. */
  unique_ptr(const unique_ptr&) = delete;
  unique_ptr& operator=(const unique_ptr&) = delete;
};

Comparing to std::auto_ptr, std::unique_ptr:

  1. Disabled copy semantics. Similar to boost::scoped_ptr, this change eliminates of accidental memory leak due to copying smart pointers;
  2. Added move constructor and move assignment. This change, together with added support of move operations in standard containers, makes std::unique_ptr usable with STL containers;
  3. Added custom deleter.

Let’s use the same example of std::merge(...) to see how std::unique_ptr solves this problem:

std::vector<std::unique_ptr<int>> uniqueV1, uniqueV2, uniqueRes;
/* initialize uniqueV1 and uniqueV2 */
std::merge(std::make_move_iterator(uniqueV1.begin()), std::make_move_iterator(uniqueV1.end()),
    std::make_move_iterator(uniqueV2.begin()), std::make_move_iterator(uniqueV2.end()),
    std::back_inserter(uniqueRes),
    [](const std::unique_ptr<int> &lhs, const std::unique_ptr<int> &rhs){
      return *lhs < *rhs;
    });

With std::make_move_interator(iter), we explicitly request the resource to be ‘moved’ from its source. This illustrates the key difference between std::auto_ptr and std::unique_ptr: both respects the uniqueness of ownership of resource, but std::unique_ptr avoid the implicitness of resource transportation with move semantics.

Performance

What is the cost of having memory safety with std::unique_ptr compared to raw pointer?

  • Dereference: No overhead.

  • Construction: Almost no overhead without custom deleter.

  • Destruction: Almost no overhead without custom deleter. Small overhead on calling destructor of std::unique_ptr vs simply calling delete() on pointer.

  • Space: None without custom deleter, otherwise overhead of deleter (one word for function pointers, 0-N for function objects, depending on how much state it captures).

Let’s use an example and analyze its assembly code to have an idea how much overhead we are talking about (gcc 10.1 , -O3 -std=c++14). ‘Red’: raw pointer. ‘Green’: unique pointer.

 int main() {
-  auto *p = new MyInt(6);
+  auto p = std::make_unique<MyInt>(6);
   std::cout << p->val << std::endl;
   int a = p->val;
   std::cout << a << std::endl;
-  delete p;
   return 0;
 }
 main:
-	push    rbp
+	push    r12
 	mov     edi, 4
+	push    rbp
+	sub     rsp, 8
 	call    operator new(unsigned long)
 	mov     esi, 6
 	mov     edi, OFFSET FLAT:_ZSt4cout
 	mov     DWORD PTR [rax], 6
 	mov     rbp, rax
 	call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
 	mov     rdi, rax
 	call    std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) [clone .isra.0]
 	mov     esi, DWORD PTR [rbp+0]
 	mov     edi, OFFSET FLAT:_ZSt4cout
 	call    std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
 	mov     rdi, rax
 	call    std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) [clone .isra.0]
 	mov     rdi, rbp
 	mov     esi, 4
 	call    operator delete(void*, unsigned long)
+	add     rsp, 8
 	xor     eax, eax
 	pop     rbp
+	pop     r12
 	ret
+	mov     r12, rax
+	jmp     .L9
+main.cold:
+.L9:
+	mov     rdi, rbp
+	mov     esi, 4
+	call    operator delete(void*, unsigned long)
+	mov     rdi, r12
+	call    _Unwind_Resume

Comparing the assembly result, we can see:

  • Close to no overhead on constructor - additional 8 bytes on stack is the only overhead;
  • No overhead on dereferencing;
  • Small overhead on destruction process - main.cold illustrate the additional process. Note that r12 being pushed to stack was a result of it being used in main.cold.

Different compilers and code sample might yield slightly different results, but it is safe to say that, with a competent compiler, there is very little overhead of using std::unique_ptr over raw pointer.

Shared Pointer

std::unique_ptr provides a solution for safely managing memory of exclusive ownership, where the memory is expected to be owned by a single owner. Sometimes we need a different ownership model, a shared ownership, where multiple owners collectively own the memory resource. std::shared_ptr was designed for this purpose. std::share_ptr is significantly more complex than std::unique_ptr.

Memory Layout

Recall that std::unique_ptr’s object memory layout consists of a pointer to the managed object and the memory of the custom deleter (if there is one). For std::shared_ptr, its object memory consists of a pointer to the managed object and a pointer to a special object (__shared_count).

// bits/shared_ptr_base.h 
/**
 * __shared_ptr is base class of std::shared_ptr where the implementations are defined.
 * It's size is fixed to sizeof() two pointers, one for a pointer to the object, ano-
 * ther one is a pointer to the control block (residing inside _M_refcount).
*/
template<typename _Tp, _Lock_policy _Lp>
class __shared_ptr : public __shared_ptr_access<_Tp, _Lp>
{
  /* ... */
private:
  element_type*	   _M_ptr;         // Contained pointer.
  __shared_count<_Lp>  _M_refcount;    // Reference counter. 
};
/**
 * __shared_count contains a pointer of _Sp_counted_base. The object it points 
 * to might be of variable length depending on the existance of a custom deleter.
*/
template<_Lock_policy _Lp> class __shared_count
{
  /* ... */
private:
  /**
   * Depending on the existance of a custom deleter, a derived class from base 
   * _Sp_counted_base' of '_Sp_counted_deleter' (with custom deleter) or
   * '_Sp_counted_ptr' (without customer deleter) will be created. The former
   * contains the customer deleter.
  */
  _Sp_counted_base<_Lp>*  _M_pi; 
};
/**
 * Counter base class, containing 2 atomic count objects to track number of
 * shared owners and weak owners.
*/
template<_Lock_policy _Lp = __default_lock_policy>
class _Sp_counted_base : public _Mutex_base<_Lp>
{
  /* ... */
private:
  _Atomic_word  _M_use_count;     // #shared
  _Atomic_word  _M_weak_count;    // #weak + (#shared != 0)
};

The snippet above illustrates the key logics of the memory layout of a std::shared_ptr. Comparing to std::unique_ptr, one difference stands out: the custom deleter.

  • The location of the custom deleter is different. std::shared_ptr conveniently places it in its counter object, which is dynamically allocated. std::unique_ptr doesn’t have the luxury of an additional pointer, so it opts to have it part of itself, hoping that most use cases don’t utilize the custom deleter, and therefore saving the space of a pointer.
  • The custom deleter is part of the class template parameter. A direct impact of their different custom deleter placement strategy is that std::unique_ptr needs the typename of its custom deleter in its class template parameter. std::shared_ptr doesn’t have this restrictions and can have containers of instances with different custom deleter:
std::shared_ptr<int> p = std::make_shared<int>(1);
std::shared_ptr<int> pWithCustomDeleter (new int(1),
    [&var](int *i){
      delete i;
    });
std::vector<std::shared_ptr<int>> v {p, pWithCustomDeleter};

Atomic-ness

std::shared_ptr involves atomic operations. It is important to distinguish that the atomic operation is not on the object managed, but on the reference counter. N1431 explains the reasoning behind the design - std::shared_ptr is designed to be interoperable between libraries, and supporting multi-thread environments greatly increase its interoperability. For different purposes:

  • Non-atomic shared pointers: boost::local_shared_ptr provides a single-threaded solution for shared pointers, and boost::instrusive_ptr is another option where user is required to specify the reference counter and provide functions to maintain it;
  • Atomic operations on the object: std::experimental::atomic_shared_ptr refines the API for atomic operation on the managed resource.

Con/De-struction

template<typename _Tp, _Lock_policy _Lp>
class __shared_ptr : public __shared_ptr_access<_Tp, _Lp>
{
  /* ... */
public:
  /*
   * Constructing shared pointer without ownership does not create ref-counter.
  */
  constexpr __shared_ptr() noexcept : _M_ptr(0), _M_refcount() { } 
  /* 
   * Constructing shared pointer from raw pointer. An __shared_count object is 
   * created, where a copy of the object is created in _Sp_counted_ptr, and bo-
   * th shared count and weak is initialized to 1 in _Sp_counted_base.
  */
  template<typename _Yp, typename = _SafeConv<_Yp>>
  explicit __shared_ptr(_Yp* __p)
	: _M_ptr(__p), _M_refcount(__p, typename is_array<_Tp>::type()) {
    _M_enable_shared_from_this_with(__p);
  }
  /*
   * Constructing shared pointer from lvalue reference to a shared_ptr. The poin-
   * ters (object and ref counter object) are copied, the latter one done in 
   * __shared_count where an atomic increase is also performed on the shared coun-
   * ter.
  */
  template<typename _Yp, typename = _Compatible<_Yp>>
  __shared_ptr(const __shared_ptr<_Yp, _Lp>& __r) noexcept
      : _M_ptr(__r._M_ptr), _M_refcount(__r._M_refcount) { }
  /*
   * Constructing shared pointer from rvalue reference to a shared_ptr. No creation
   * of object or atomic operations are incurred. The source object's pointers are
   * copied and cleared.
  */
  template<typename _Yp, typename = _Compatible<_Yp>>
  __shared_ptr(__shared_ptr<_Yp, _Lp>&& __r) noexcept
      : _M_ptr(__r._M_ptr), _M_refcount() {
    _M_refcount._M_swap(__r._M_refcount);
    __r._M_ptr = 0;
  }
  /*
   * Constructing shared pointer from rvalue reference to a unique_ptr. The object
   * pointer is copied, a shared pointer object is created (in _Sp_counted_base) and
   * unique_ptr is cleared (in _Sp_counted_ptr).
  */
  template<typename _Yp, typename _Del, typename = _UniqCompatible<_Yp, _Del>>
  __shared_ptr(unique_ptr<_Yp, _Del>&& __r)
      : _M_ptr(__r.get()), _M_refcount() {
    auto __raw = _S_raw_ptr(__r.get());
    _M_refcount = __shared_count<_Lp>(std::move(__r));
    _M_enable_shared_from_this_with(__raw);
  }
  /*
   * Destruction is delegated to _Sp_counted_base, where the counter are decrease and
   * checked for 0. If 0 is reached for the counter, the counter object is delete and
   * object deletion is delegated to _Sp_counted_ptr/_Sp_counted_deleter.
  */
 ~__shared_ptr() = default;
};

In summary,

  • When constructed from a raw pointer, std::shared_ptr creates a reference counted object;
  • When constructed moving from a std::unique_ptr , std::shared_ptr creates a reference counted object and clears the std::unique_ptr. Passing the result of std::make_unique(...) falls in this category;
  • When constructed copying from a std::shared_ptr, std::shared_ptr performs atomic increases on the shared reference counted object;
  • When constructed moving from a std::shared_ptr, std::shared_ptr simply copies the pointers and perform no creation of objects or atomic increases, making it cheaper than its copying counterparts. Passing the result of std::make_shared(...) falls in this category;
  • Destructor involves decreasing the reference counted object and deletion of object if counter falls to 0.

Performance

  • Dereference: No overhead just like std::unique_ptr.

  • Construction: Cheap when moving from a std::shared_ptr; less cheap when copying from a std::shared_ptr as it involves atomic increase operation; expensive when creating from raw pointers or std::unique_ptr as it involves creation of reference counted object.

  • Destruction: Noticeably more expensive than std::unique_ptr, given that atomic operations. Even more expensive when it involves destruction of reference counted object.

  • Space: the smart pointer std::shared_ptr itself is double of the size of a raw pointer. However, there are additional memory overhead for reference counter for each managed object.

Weak Pointer

Weak pointer is not a standalone smart pointer like its two siblings. It is an augmentation of shared pointer to address a specific set of issues: non-owning observers, cycle-ownership, etc. For example:

// cyclic_shared_ptr.cpp 
struct A { std::shared_ptr<B> p; };
struct B { std::shared_ptr<A> p; };

int main() {
    auto a = std::make_shared<A>(); // a points to obj1[count:=1]
    auto b = std::make_shared<B>(); // b points to obj2[count:=1]
    a->p = b;                       // a->p also points to obj2[count:=2]
    b->p = a;                       // b->p also points to obj1[count:=2]
}                            // a and b goes out of scope, obj1[count:=1], obj2[count:=1]

Both objects pointed by a and b has a shared reference count of 1 after program exits main()’s scope, and memory leak happens:

$ valgrind --leak-check=full -v cyclic_shared_ptr
...
total heap usage: 3 allocs, 1 frees

By changing one member pointer to a weak pointer, we break this cyclic loop:

struct B { std::weak_ptr<A> p; };

std::weak_ptr and std::shared_ptr are deeply integrated together: even their source code are defined in the same files. Weak pointer cannot reference the object directly. In fact, its usage requires conversion to std::shared_ptr before referencing the object:

auto sp = wp.lock(); // wp is a std::weak_ptr and sp is std::shared_ptr of same object type 
if (sp)  do_something(*sp);

It is important to note that, the existence of a weak pointer does not prevent the object being deleted, but it keeps the reference counter object alive. The shared counter dictates the existence of the object, while weak counter (=#weak + #shared) decides the existence of the counter object.